搜索

x

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

使用中间层受监督的自编码器探索蛋白质的构象空间

陈光临 张志勇

引用本文:
Citation:

使用中间层受监督的自编码器探索蛋白质的构象空间

陈光临, 张志勇

Exploring proten’s conformational space by using encoding layer supervised auto-encoder

Chen Guang-Lin, Zhang Zhi-Yong
PDF
HTML
导出引用
  • 蛋白质的功能往往与其结构和动态变化密切相关. 分子动力学模拟是研究蛋白质结构变化的有效方法, 然而使用分子动力学模拟对蛋白质的构象空间进行采样需要花费很长的时间. 近年来的一些研究表明, 使用简单的机器学习模型——自编码器及其改进型, 可以在有限采样的情况下, 快速完成对蛋白质构象空间的探索. 该模型通过训练神经网络, 完成对隐变量的提取, 同时根据其产生构象, 但是由于提取出的隐变量没有直观的含义, 探索构象空间的方向会受到影响. 本工作通过引入反应坐标(如质心距离等), 建立了一个中间层受监督的自编码器模型, 以解决上述问题. 该模型应用于噬菌体T4溶菌酶和腺苷酸激酶两个蛋白质分子, 结果表明, 仅使用短时间分子动力学模拟作为训练数据, 就可以探索到这两种蛋白分子的多种典型构象. 有监督(合理的反应坐标或者实验数据等)的自编码器模型有望成为探索蛋白质构象空间的有效工具.
    Protein function is related to its structure and dynamic change. Molecular dynamics simulation is an important tool for studying protein dynamics by exploring its conformational space, however, conformational sampling is a nontrivial issue, because of the risk of missing key details during sampling. In recent years, deep learning methods, such as auto-encoder, can couple with MD to explore conformational space of protein. After being trained with the MD trajectories, auto-encoder can generate new conformations quickly by inputting random numbers in low dimension space. However, some problems still exist, such as requirements for the quality of the training set, the limitation of explorable area and the undefined sampling direction. In this work, we build a supervised auto-encoder, in which some reaction coordinates are used to guide conformational exploration along certain directions. We also try to expand the explorable area by training through the data generated by the model. Two multi-domain proteins, bacteriophage T4 lysozyme and adenylate kinase, are used to illustrate the method. In the case of the training set consisting of only under-sampled simulated trajectories, the supervised auto-encoder can still explore along the given reaction coordinates. The explored conformational space can cover all the experimental structures of the proteins and be extended to regions far from the training sets. Having been verified by molecular dynamics and secondary structure calculations, most of the conformations explored are found to be plausible. The supervised auto-encoder provides a way to efficiently expand the conformational space of a protein with limited computational resources, although some suitable reaction coordinates are required. By integrating appropriate reaction coordinates or experimental data, the supervised auto-encoder may serve as an efficient tool for exploring conformational space of proteins.
      通信作者: 张志勇, zzyzhang@ustc.edu.cn
    • 基金项目: 国家重点研发计划(批准号: 2021YFA1301504)、国家自然科学基金(批准号: 91953101)和中国科学院战略性先导科技专项(B类)(批准号: XDB37040202)资助的课题.
      Corresponding author: Zhang Zhi-Yong, zzyzhang@ustc.edu.cn
    • Funds: Project supported by the National Key Research and Development Program of China (Grant No. 2021YFA1301504), the National Natural Science Foundation of China (Grant No. 91953101), and the Strategic Priority Research Program (B) of the Chinese Academy of Sciences (Grant No. XDB37040202).
    [1]

    Chu X, Gan L, Wang E, Wang J 2013 Proc. Natl. Acad. Sci. U.S.A. 110 E2342Google Scholar

    [2]

    Smyth M S, Martin J H 2000 Mol. Pathol. 53 8Google Scholar

    [3]

    Danev R, Yanagisawa H, Kikkawa M 2019 Trends Biochem. Sci. 44 837Google Scholar

    [4]

    Vincenzi M, Mercurio F A, Leone M 2021 Curr. Med. Chem. 28 2729Google Scholar

    [5]

    Kachala M, Valentini E, Svergun D I 2015 Adv. Exp. Med. Biol. 870 261Google Scholar

    [6]

    Chu F, Thornton D T, Nguyen H T 2018 Methods 144 53Google Scholar

    [7]

    Bhaumik S R 2021 Emerg. Top Life Sci. 5 49Google Scholar

    [8]

    Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl S A A, Ballard A J, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior A W, Kavukcuoglu K, Kohli P, Hassabis D 2021 Nature 596 583Google Scholar

    [9]

    Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee G R, Wang J, Cong Q, Kinch L N, Schaeffer R D, Millán C, Park H, Adams C, Glassman C R, DeGiovanni A, Pereira J H, Rodrigues A V, van Dijk A A, Ebrecht A C, Opperman D J, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy M K, Dalwadi U, Yip C K, Burke J E, Garcia K C, Grishin N V, Adams P D, Read R J, Baker D 2021 Science 373 871Google Scholar

    [10]

    Karplus M, Kuriyan J 2005 Proc. Natl. Acad. Sci. 102 6679Google Scholar

    [11]

    Bernardi R C, Melo M C R, Schulten K 2015 Biochim. Biophys. Acta 1850 872Google Scholar

    [12]

    Mu J, Liu H, Zhang J, Luo R, Chen H F 2021 J. Chem. Inf. Model. 61 1037Google Scholar

    [13]

    Lemke T, Peter C 2019 J. Chem. Theory Comput. 15 1209Google Scholar

    [14]

    Zhu J, Wang J, Han W, Xu D 2022 Nat. Commun. 13 1661Google Scholar

    [15]

    Hinton G E, Salakhutdinov R R 2006 Science 313 504Google Scholar

    [16]

    Degiacomi M T 2019 Structure 27 1034Google Scholar

    [17]

    Wen B, Peng J, Zuo X, Gong Q, Zhang Z 2014 Biophysical J. 107 956Google Scholar

    [18]

    Giri Rao V V H, Gosavi S 2014 PLOS Computational Biology 10 e1003938Google Scholar

    [19]

    Abraham M J, Murtola T, Schulz R, Páll S, Smith J C, Hess B, Lindahl E 2015 SoftwareX 1–2 19Google Scholar

    [20]

    Weaver L H, Matthews B W 1987 J. Mol. Biol. 193 189Google Scholar

    [21]

    Zhang X J, Wozniak J A, Matthews B W 1995 J. Mol. Biol. 250 527Google Scholar

    [22]

    Müller C W, Schulz G E 1992 J. Mol. Biol. 224 159Google Scholar

    [23]

    Müller C W, Schlauderer G J, Reinstein J, Schulz G E 1996 Structure 4 147Google Scholar

    [24]

    Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C 2006 Proteins Struct. Funct. Bioinf. 65 712Google Scholar

    [25]

    Izadi S, Anandakrishnan R, Onufriev A V 2014 J. Phys. Chem. Lett. 5 3863Google Scholar

    [26]

    Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot B L, Grubmüller H, MacKerell A D 2017 Nat. Methods 14 71Google Scholar

    [27]

    Bussi G, Donadio D, Parrinello M 2007 J. Chem. Phys. 126 014101Google Scholar

    [28]

    Essmann U, Perera L E, Berkowitz M L, Darden T A, Lee H C, Pedersen L G 1995 J. Chem. Phys. 103 8577Google Scholar

    [29]

    Kingma D P, Ba J 2014 arXiv:1412.6980 [cs.LG

    [30]

    Lovell S C, Davis I W, Arendall III W B, de Bakker P I W, Word J M, Prisant M G, Richardson J S, Richardson D C 2003 Proteins Struct. Funct. Bioinf. 50 437Google Scholar

    [31]

    Eastman P, Swails J, Chodera J D, McGibbon R T, Zhao Y, Beauchamp K A, Wang L P, Simmonett A C, Harrigan M P, Stern C D, Wiewiora R P, Brooks B R, Pande V S 2017 PLoS Comput. Biol. 13 e1005659Google Scholar

    [32]

    Shirts M R, Klein C, Swails J M, Yin J, Gilson M K, Mobley D L, Case D A, Zhong E D 2017 J. Comput. -Aided Mol. Des. 31 147Google Scholar

    [33]

    Touw W G, Baakman C, Black J, te Beek T A, Krieger E, Joosten R P, Vriend G 2015 Nucleic Acids Res. 43 D364Google Scholar

  • 图 1  中间层受监督的自编码器示意图

    Fig. 1.  Schematic of supervised-AE.

    图 2  本研究中使用的两种蛋白质分子的不同结构 (a) T4L的闭合(不透明)和打开(透明)结构, 紫色为α螺旋, 黄色为β折叠; (b) AdK的闭合(不透明)和打开(透明)结构, 不同颜色表示不同的结构域

    Fig. 2.  Different structures of the two proteins in the work. (a) The close (opaque) and open (transparent) state of T4L. α-helix is colored in purple and β-sheet is colored in yellow. (b) The close (opaque) and open (transparent) state of AdK. Different domains are colored in different colors.

    图 3  T4L的构象空间探索结果 (a) 使用AMBER99SB力场/OPC水模型; (b)使用CHARMM36m力场/TIP3P水模型

    Fig. 3.  Results of conformational space exploration of T4L: (a) With AMBER99SB/OPC; (b) with CHARMM36m/ TIP3P.

    图 4  探索到的不同T4L构象 (a) PDB编号173L的晶体结构(不透明)与探索到的相似结构(透明); (b) 开合程度不同的两个构象; (c) 扭动情况不同的两个构象; 紫色为α螺旋, 黄色为β折叠

    Fig. 4.  Different T4L conformations explored: (a) PDB:173L (opaque) and a similar structure explored; (b) two conformations with different degrees of opening and closing; (c) two conformations with different degrees of twisting. α-helix is colored in purple and β-sheet is colored in yellow.

    图 5  T4L构象探索结果的合理性检验 (a) 使用AMBER99SB力场/OPC水模型; (b) 使用CHARMM36m力场/TIP3P水模型; (c) 修复后各代表构象的二级结构含量, 参考值为模拟轨迹的平均值

    Fig. 5.  Plausibility check of T4L conformational exploration results: (a) With AMBER99SB/OPC; (b) with CHARMM36m/TIP3P; (c) secondary structure counts of each representative conformation after fixing, the reference is the average value of the simulated trajectory.

    图 6  仅从打开状态出发的T4L构象探索结果

    Fig. 6.  Results of T4L conformational exploration from the open state only.

    图 7  AdK的构象空间探索结果 (a) 使用AMBER99SB力场/OPC水模型; (b)使用CHARMM36m力场/TIP3P水模型

    Fig. 7.  Results of conformational space exploration of AdK: (a) With AMBER99SB/OPC; (b) with CHARMM36m/TIP3P.

    图 8  探索到的不同AdK构象

    Fig. 8.  Different AdK conformations explored.

    图 9  AdK构象探索结果的合理性检验 (a) 使用AMBER99SB力场/OPC水模型; (b)使用CHARMM36m力场/TIP3P水模型; (c) 修复后各代表构象的二级结构含量, 参考值为模拟轨迹的平均值

    Fig. 9.  Plausibility check of AdK conformational exploration results: (a) With AMBER99SB/OPC; (b) with CHARMM36m/TIP3P; (c) secondary structure counts of each representative conformation after fixing, the reference is the average value of the simulated trajectory.

    图 10  使用普通自编码器探索AdK的构象空间

    Fig. 10.  Exploring the conformational space of AdK with a common self-encoder.

  • [1]

    Chu X, Gan L, Wang E, Wang J 2013 Proc. Natl. Acad. Sci. U.S.A. 110 E2342Google Scholar

    [2]

    Smyth M S, Martin J H 2000 Mol. Pathol. 53 8Google Scholar

    [3]

    Danev R, Yanagisawa H, Kikkawa M 2019 Trends Biochem. Sci. 44 837Google Scholar

    [4]

    Vincenzi M, Mercurio F A, Leone M 2021 Curr. Med. Chem. 28 2729Google Scholar

    [5]

    Kachala M, Valentini E, Svergun D I 2015 Adv. Exp. Med. Biol. 870 261Google Scholar

    [6]

    Chu F, Thornton D T, Nguyen H T 2018 Methods 144 53Google Scholar

    [7]

    Bhaumik S R 2021 Emerg. Top Life Sci. 5 49Google Scholar

    [8]

    Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl S A A, Ballard A J, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior A W, Kavukcuoglu K, Kohli P, Hassabis D 2021 Nature 596 583Google Scholar

    [9]

    Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee G R, Wang J, Cong Q, Kinch L N, Schaeffer R D, Millán C, Park H, Adams C, Glassman C R, DeGiovanni A, Pereira J H, Rodrigues A V, van Dijk A A, Ebrecht A C, Opperman D J, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy M K, Dalwadi U, Yip C K, Burke J E, Garcia K C, Grishin N V, Adams P D, Read R J, Baker D 2021 Science 373 871Google Scholar

    [10]

    Karplus M, Kuriyan J 2005 Proc. Natl. Acad. Sci. 102 6679Google Scholar

    [11]

    Bernardi R C, Melo M C R, Schulten K 2015 Biochim. Biophys. Acta 1850 872Google Scholar

    [12]

    Mu J, Liu H, Zhang J, Luo R, Chen H F 2021 J. Chem. Inf. Model. 61 1037Google Scholar

    [13]

    Lemke T, Peter C 2019 J. Chem. Theory Comput. 15 1209Google Scholar

    [14]

    Zhu J, Wang J, Han W, Xu D 2022 Nat. Commun. 13 1661Google Scholar

    [15]

    Hinton G E, Salakhutdinov R R 2006 Science 313 504Google Scholar

    [16]

    Degiacomi M T 2019 Structure 27 1034Google Scholar

    [17]

    Wen B, Peng J, Zuo X, Gong Q, Zhang Z 2014 Biophysical J. 107 956Google Scholar

    [18]

    Giri Rao V V H, Gosavi S 2014 PLOS Computational Biology 10 e1003938Google Scholar

    [19]

    Abraham M J, Murtola T, Schulz R, Páll S, Smith J C, Hess B, Lindahl E 2015 SoftwareX 1–2 19Google Scholar

    [20]

    Weaver L H, Matthews B W 1987 J. Mol. Biol. 193 189Google Scholar

    [21]

    Zhang X J, Wozniak J A, Matthews B W 1995 J. Mol. Biol. 250 527Google Scholar

    [22]

    Müller C W, Schulz G E 1992 J. Mol. Biol. 224 159Google Scholar

    [23]

    Müller C W, Schlauderer G J, Reinstein J, Schulz G E 1996 Structure 4 147Google Scholar

    [24]

    Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C 2006 Proteins Struct. Funct. Bioinf. 65 712Google Scholar

    [25]

    Izadi S, Anandakrishnan R, Onufriev A V 2014 J. Phys. Chem. Lett. 5 3863Google Scholar

    [26]

    Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot B L, Grubmüller H, MacKerell A D 2017 Nat. Methods 14 71Google Scholar

    [27]

    Bussi G, Donadio D, Parrinello M 2007 J. Chem. Phys. 126 014101Google Scholar

    [28]

    Essmann U, Perera L E, Berkowitz M L, Darden T A, Lee H C, Pedersen L G 1995 J. Chem. Phys. 103 8577Google Scholar

    [29]

    Kingma D P, Ba J 2014 arXiv:1412.6980 [cs.LG

    [30]

    Lovell S C, Davis I W, Arendall III W B, de Bakker P I W, Word J M, Prisant M G, Richardson J S, Richardson D C 2003 Proteins Struct. Funct. Bioinf. 50 437Google Scholar

    [31]

    Eastman P, Swails J, Chodera J D, McGibbon R T, Zhao Y, Beauchamp K A, Wang L P, Simmonett A C, Harrigan M P, Stern C D, Wiewiora R P, Brooks B R, Pande V S 2017 PLoS Comput. Biol. 13 e1005659Google Scholar

    [32]

    Shirts M R, Klein C, Swails J M, Yin J, Gilson M K, Mobley D L, Case D A, Zhong E D 2017 J. Comput. -Aided Mol. Des. 31 147Google Scholar

    [33]

    Touw W G, Baakman C, Black J, te Beek T A, Krieger E, Joosten R P, Vriend G 2015 Nucleic Acids Res. 43 D364Google Scholar

  • [1] 宋睿, 刘雪梅, 王海滨, 吕皓, 宋晓艳. 机器学习辅助的WC-Co硬质合金硬度预测. 物理学报, 2024, 73(12): 126201. doi: 10.7498/aps.73.20240284
    [2] 张桥, 谭薇, 宁勇祺, 聂国政, 蔡孟秋, 王俊年, 朱慧平, 赵宇清. 基于机器学习和第一性原理计算的Janus材料的预测. 物理学报, 2024, 73(23): 230201. doi: 10.7498/aps.73.20241278
    [3] 张旭, 丁进敏, 侯晨阳, 赵一鸣, 刘鸿维, 梁生. 基于机器学习的激光匀光整形方法. 物理学报, 2024, 73(16): 164205. doi: 10.7498/aps.73.20240747
    [4] 汤天一, 熊翊名, 张睿格, 张建, 李文飞, 王骏, 王炜. 融合结构知识的蛋白质预训练模型进展. 物理学报, 2024, 73(18): 188701. doi: 10.7498/aps.73.20240811
    [5] 欧阳鑫健, 张岩星, 王之龙, 张锋, 陈韦嘉, 庄园, 揭晓, 刘来君, 王大威. 面向铁电相变的机器学习: 基于图卷积神经网络的分子动力学模拟. 物理学报, 2024, 73(8): 086301. doi: 10.7498/aps.73.20240156
    [6] 张嘉晖. 蛋白质计算中的机器学习. 物理学报, 2024, 73(6): 069301. doi: 10.7498/aps.73.20231618
    [7] 张逸凡, 任卫, 王伟丽, 丁书剑, 李楠, 常亮, 周倩. 机器学习结合固溶强化模型预测高熵合金硬度. 物理学报, 2023, 72(18): 180701. doi: 10.7498/aps.72.20230646
    [8] 郭唯琛, 艾保全, 贺亮. 机器学习回归不确定性揭示自驱动活性粒子的群集相变. 物理学报, 2023, 72(20): 200701. doi: 10.7498/aps.72.20230896
    [9] 刘烨, 牛赫然, 李兵兵, 马欣华, 崔树旺. 机器学习在宇宙线粒子鉴别中的应用. 物理学报, 2023, 72(14): 140202. doi: 10.7498/aps.72.20230334
    [10] 罗方芳, 蔡志涛, 黄艳东. 蛋白质pKa预测模型研究进展. 物理学报, 2023, 72(24): 248704. doi: 10.7498/aps.72.20231356
    [11] 张宇航, 薛振勇, 孙皓, 张珠伟, 陈虎. 酰基辅酶A结合蛋白去折叠动力学的单分子磁镊研究. 物理学报, 2023, 72(15): 158702. doi: 10.7498/aps.72.20230533
    [12] 罗启睿, 沈一凡, 罗孟波. 高分子塌缩相变和临界吸附相变的计算机模拟和机器学习. 物理学报, 2023, 72(24): 240502. doi: 10.7498/aps.72.20231058
    [13] 管星悦, 黄恒焱, 彭华祺, 刘彦航, 李文飞, 王炜. 生物分子模拟中的机器学习方法. 物理学报, 2023, 72(24): 248708. doi: 10.7498/aps.72.20231624
    [14] 林开东, 林晓倩, 林绪波. 靶向PD-L1蛋白的计算机辅助药物筛选. 物理学报, 2023, 72(24): 240501. doi: 10.7498/aps.72.20231068
    [15] 张嘉伟, 姚鸿博, 张远征, 蒋伟博, 吴永辉, 张亚菊, 敖天勇, 郑海务. 通过机器学习实现基于摩擦纳米发电机的自驱动智能传感及其应用. 物理学报, 2022, 71(7): 078702. doi: 10.7498/aps.71.20211632
    [16] 艾飞, 刘志兵, 张远涛. 结合机器学习的大气压介质阻挡放电数值模拟研究. 物理学报, 2022, 71(24): 245201. doi: 10.7498/aps.71.20221555
    [17] 林键, 叶梦, 朱家纬, 李晓鹏. 机器学习辅助绝热量子算法设计. 物理学报, 2021, 70(14): 140306. doi: 10.7498/aps.70.20210831
    [18] 陈江芷, 杨晨温, 任捷. 基于波动与扩散物理系统的机器学习. 物理学报, 2021, 70(14): 144204. doi: 10.7498/aps.70.20210879
    [19] 刘武, 朱成皖, 李昊天, 赵谡玲, 乔泊, 徐征, 宋丹丹. 基于机器学习和器件模拟对Cu(In,Ga)Se2电池中Ga含量梯度的优化分析. 物理学报, 2021, 70(23): 238802. doi: 10.7498/aps.70.20211234
    [20] 杨自欣, 高章然, 孙晓帆, 蔡宏灵, 张凤鸣, 吴小山. 铅基钙钛矿铁电晶体高临界转变温度的机器学习研究. 物理学报, 2019, 68(21): 210502. doi: 10.7498/aps.68.20190942
计量
  • 文章访问数:  2968
  • PDF下载量:  184
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-28
  • 修回日期:  2023-07-29
  • 上网日期:  2023-09-12
  • 刊出日期:  2023-12-20

/

返回文章
返回