中国科学院战略性先导科技专项(XDB0480000)
YANG Chunhe
Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China;National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, ChinaMA Hongwu
Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China;National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, China蛋白质是生命活动的基础,研究蛋白表达机制对于揭示细胞组织规律与促进生物技术发展至关重要。蛋白质表达是一个涵盖转录、翻译、折叠、转运与翻译后修饰等精密调控的复杂过程,结合蛋白表达数据构建其模型对理解蛋白表达的各种细胞因素和调控机制具有重要意义。本文重点评述了近年来蛋白表达过程机理模型构建和通过人工智能方法分析各种因素对蛋白表达的影响。化学反应网络模型可从转录翻译的底层过程对蛋白表达进行数学建模,可分析各种胞内成分如聚合酶、tRNA等对蛋白表达的影响,但模型参数数量巨大,难以直接实验确定,参数拟合是一个需要解决的难题。与之相对,数据驱动的人工智能模型主要研究目标蛋白的氨基酸序列和相应基因及调控区核苷酸序列对蛋白表达的影响,进而指导通过序列设计提高蛋白表达量。将机理模型和人工智能模型相结合,综合考虑胞内因素和表达序列特征的影响,有望进一步加深对蛋白表达系统的理解,为高价值目标蛋白的高效表达和细胞中不同蛋白的协调表达调控提供理论和技术支持。
Proteins are the basic building blocks of life. Studying the protein expression mechanism is essential for understanding the cellular organization principles and the development of biotechnology. Protein expression, involving transcription, translation, folding, and post-translational modification, is a complicatedly regulated process affected by various cellular components and sequence features of the expressed protein. Establishing protein expression models based on expression data is of great significance for probing into the regulatory factors and mechanisms of protein expression. Here we review the recent research progress in the mechanism models for quantitatively simulating the protein expression process and the prediction algorithms based on artificial intelligence for analyzing the regulatory factors. Chemical reaction network models have been developed to mathematically describe the elementary processes in protein expression and simulate the influences of various cellular components such as RNA polymerase and tRNA. However, the experimental determination of the huge number of model parameters is a big challenge. The main objective of data-driven AI models is to study the effects of protein/DNA sequences of the target protein on its expression, and subsequently optimize the sequences to improve protein expression. Methods combining mechanism models and AI models have the potential to deepen our understanding of protein expression processes, providing theoretical and technical support for the efficient production of high-value proteins and coordinate the regulation of different proteins.
杨毅,杜军,杨春贺,马红武. 蛋白表达系统的机理模型和人工智能模型研究进展[J]. 生物工程学报, 2025, 41(3): 1079-1097
复制通信地址:中国科学院微生物研究所 邮编:100101
电话:010-64807509 E-mail:cjb@im.ac.cn
技术支持:北京勤云科技发展有限公司