基于BERT与Text-CNN的抗菌肽识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划(2021YFA1301603)


An antibacterial peptides recognition method based on BERT and Text-CNN
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    抗菌肽(antimicrobial peptides, AMPs)广泛存在于生命体中,是一种具有广谱抗菌活性、免疫调节功能的小分子多肽。抗菌肽不易产生耐药性,适用范围广,具有极大的临床价值,是传统抗生素的有力竞争者。识别抗菌肽是抗菌肽研究领域中的重要研究方向,湿实验法在进行大规模抗菌肽识别时存在成本高、效率低、周期长等难点,计算机辅助识别法是抗菌肽识别手段的重要补充,如何提升准确率是其中的关键问题。蛋白质序列可以被近似地看作是由氨基酸组成的语言,运用自然语言处理(natural language processing, NLP)技术可能提取到丰富的特征。本文将自然语言处理领域中的预训练模型BERT和微调结构Text-CNN结合,对蛋白质语言进行建模,提供了开源可用的抗菌肽识别工具,并与已发表的5种抗菌肽识别工具进行了比较。结果表明,优化“预训练-微调”策略带来了准确率、敏感度、特异性和马修相关系数的整体提升,为进一步研究抗菌肽识别算法提供了新思路。

    Abstract:

    Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.

    参考文献
    相似文献
    引证文献
引用本文

徐小放,杨春德,舒坤贤,袁新普,李默程,朱云平,陈涛. 基于BERT与Text-CNN的抗菌肽识别方法[J]. 生物工程学报, 2023, 39(4): 1815-1824

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-04
  • 最后修改日期:
  • 录用日期:2023-02-17
  • 在线发布日期: 2023-04-14
  • 出版日期: 2023-04-25
文章二维码
您是第位访问者
生物工程学报 ® 2024 版权所有

通信地址:中国科学院微生物研究所    邮编:100101

电话:010-64807509   E-mail:cjb@im.ac.cn

技术支持:北京勤云科技发展有限公司