搜索

x
中国物理学会期刊

融合结构知识的蛋白质预训练模型进展

Progress in protein pre-training models integrating structural knowledge

PDF
HTML
导出引用
  • 自然语言和图像处理领域引发的人工智能革命给蛋白质计算领域带来了新的思路和研究范式. 其中一个重大的进展是从海量蛋白质序列通过自监督学习得到预训练的蛋白质语言模型. 这类预训练模型编码了蛋白质的序列、进化、结构乃至功能等多种信息, 可方便地迁移至多种下游任务, 并展现了强大的泛化能力. 在此基础上, 人们正进一步发展融合更多种类数据的多模态预训练模型. 考虑到蛋白质结构是决定其功能的主要因素, 融合了结构信息的蛋白质预训练模型可更好地支持下游多种任务, 本文对这一方向的研究工作进行了介绍和总结. 此外, 还简介了融合先验知识的蛋白质预训练模型、RNA语言模型、蛋白质设计等方面的工作, 讨论了这些领域目前的现状、困难及可能的解决方案.

     

    The AI revolution, sparked by natural language and image processing, has brought new ideas and research paradigms to the field of protein computing. One significant advancement is the development of pre-training protein language models through self-supervised learning from massive protein sequences. These pre-trained models encode various information about protein sequences, evolution, structures, and even functions, which can be easily transferred to various downstream tasks and demonstrate robust generalization capabilities. Recently, researchers have further developed multimodal pre-trained models that integrate more diverse types of data. The recent studies in this direction are summarized and reviewed from the following aspects in this paper. Firstly, the protein pre-training models that integrate protein structures into language models are reviewed: this is particularly important, for protein structure is the primary determinant of its function. Secondly, the pre-trained models that integrate protein dynamic information are introduced. These models may benefit downstream tasks such as protein-protein interactions, soft docking of ligands, and interactions involving allosteric proteins and intrinsic disordered proteins. Thirdly, the pre-trained models that integrate knowledge such as gene ontology are described. Fourthly, we briefly introduce pre-trained models in RNA fields. Finally, we introduce the most recent developments in protein designs and discuss the relationship of these models with the aforementioned pre-trained models that integrate protein structure information.

     

    目录

    /

    返回文章
    返回