基于BERT与Text-CNN的抗菌肽识别方法
作者:
基金项目:

国家重点研发计划(2021YFA1301603)


An antibacterial peptides recognition method based on BERT and Text-CNN
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [48]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    抗菌肽(antimicrobial peptides, AMPs)广泛存在于生命体中,是一种具有广谱抗菌活性、免疫调节功能的小分子多肽。抗菌肽不易产生耐药性,适用范围广,具有极大的临床价值,是传统抗生素的有力竞争者。识别抗菌肽是抗菌肽研究领域中的重要研究方向,湿实验法在进行大规模抗菌肽识别时存在成本高、效率低、周期长等难点,计算机辅助识别法是抗菌肽识别手段的重要补充,如何提升准确率是其中的关键问题。蛋白质序列可以被近似地看作是由氨基酸组成的语言,运用自然语言处理(natural language processing, NLP)技术可能提取到丰富的特征。本文将自然语言处理领域中的预训练模型BERT和微调结构Text-CNN结合,对蛋白质语言进行建模,提供了开源可用的抗菌肽识别工具,并与已发表的5种抗菌肽识别工具进行了比较。结果表明,优化“预训练-微调”策略带来了准确率、敏感度、特异性和马修相关系数的整体提升,为进一步研究抗菌肽识别算法提供了新思路。

    Abstract:

    Antimicrobial peptides (AMPs) are small molecule peptides that are widely found in living organisms with broad-spectrum antibacterial activity and immunomodulatory effect. Due to slower emergence of resistance, excellent clinical potential and wide range of application, AMP is a strong alternative to conventional antibiotics. AMP recognition is a significant direction in the field of AMP research. The high cost, low efficiency and long period shortcomings of the wet experiment methods prevent it from meeting the need for the large-scale AMP recognition. Therefore, computer-aided identification methods are important supplements to AMP recognition approaches, and one of the key issues is how to improve the accuracy. Protein sequences could be approximated as a language composed of amino acids. Consequently, rich features may be extracted using natural language processing (NLP) techniques. In this paper, we combine the pre-trained model BERT and the fine-tuned structure Text-CNN in the field of NLP to model protein languages, develop an open-source available antimicrobial peptide recognition tool and conduct a comparison with other five published tools. The experimental results show that the optimization of the two-phase training approach brings an overall improvement in accuracy, sensitivity, specificity, and Matthew correlation coefficient, offering a novel approach for further research on AMP recognition.

    参考文献
    [1] ASLAM B, WANG W, ARSHAD MI, KHURSHID M, MUZAMMIL S, RASOOL MH, NISAR MA, ALVI RF, ASLAM MA, QAMAR MU, SALAMAT MKF, BALOCH Z. Antibiotic resistance:a rundown of a global crisis[J]. Infection and Drug Resistance, 2018, 11:1645-1658.
    [2] MAGANA M, PUSHPANATHAN M, SANTOS AL, LEANSE L, FERNANDEZ M, IOANNIDIS A, GIULIANOTTI MA, APIDIANAKIS Y, BRADFUTE S, FERGUSON AL, CHERKASOV A, SELEEM MN, PINILLA C, deLa FUENTE-NUNEZ C, LAZARIDIS T, DAI TH, HOUGHTEN RA, HANCOCK REW, TEGOS GP. The value of antimicrobial peptides in the age of resistance[J]. The Lancet Infectious Diseases, 2020, 20(9):e216-e230.
    [3] BROWNE K, CHAKRABORTY S, CHEN RX, WILLCOX MD, STCLAIR BLACK D, WALSH WR, KUMAR N. A new era of antibiotics:the clinical potential of antimicrobial peptides[J]. International Journal of Molecular Sciences, 2020, 21(19):7047.
    [4] LEI J, SUN LC, HUANG SY, ZHU CH, LI P, HE J, MACKEY V, COY DH, HE QY. The antimicrobial peptides and their potential clinical applications[J]. American Journal of Translational Research, 2019, 11(7):3919-3931.
    [5] 于伟康, 张珊珊, 杨占一, 王家俊, 单安山. 超分子多肽自组装在生物医学中的应用[J]. 生物工程学报, 2021, 37(7):2240-2255. YU WK, ZHANG SS, YANG ZY, WANG JJ, SHAN AS. Application of supramolecular peptide self-assembly in biomedicine[J]. Chinese Journal of Biotechnology, 2021, 37(7):2240-2255(in Chinese).
    [6] HUAN YC, KONG Q, MOU HJ, YI HX. Antimicrobial peptides:classification, design, application and research progress in multiple fields[J]. Frontiers in Microbiology, 2020, 11:582779.
    [7] MEDEMA MH, FISCHBACH MA. Computational approaches to natural product discovery[J]. Nature Chemical Biology, 2015, 11(9):639-648.
    [8] LI X, WU M, KWOH C-K, NG S-K.Computational approaches for detecting protein complexes from protein interaction networks:a survey[J]. BMC Genomics,2010, 11(1):10.1186/1471-2164-11-S1-S3.
    [9] KÜKEN A, NIKOLOSKI Z. Computational approaches to design and test plant synthetic metabolic pathways[J]. Plant Physiology, 2019, 179(3):894-906.
    [10] XIAO X, WANG P, LIN WZ, JIA JH, CHOU KC. iAMP-2L:a two-level multi-label classifier for identifying antimicrobial peptides and their functional types[J]. Analytical Biochemistry, 2013, 436(2):168-177.
    [11] LIN Y, CAI Y, LIU J, LIN C, LIU X. An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies[J]. BMC Bioinformatics,2019, 20(8):10.1186/s12859-019-2766-9.
    [12] YOUMANS M, SPAINHOUR C, QIU P. Long short-term memory recurrent neural networks for antibacterial peptide identification[C]//2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). November 13-16, 2017, Kansas City, MO, USA. IEEE, 2017:498-502.
    [13] VELTRI D, KAMATH U, SHEHU A. Deep learning improves antimicrobial peptide recognition[J]. Bioinformatics, 2018, 34(16):2740-2747.
    [14] ZHANG Y, LIN JY, ZHAO LM, ZENG XX, LIU XR. A novel antibacterial peptide recognition algorithm based on BERT[J]. Briefings in Bioinformatics, 2021, 22(6):bbab200.
    [15] DEVLIN J, CHANG M, LEE K, TOUTANOVA K. Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL]. 2018:ar Xiv:1810. 04805. https://arxiv.org/abs/1810.04805.
    [16] OFER D, BRANDES N, LINIAL M. The language of proteins:NLP, machine learning & protein sequences[J]. Computational and Structural Biotechnology Journal, 2021, 19:1750-1758.
    [17] TRAN NH, ZHANG X, XIN L, SHAN B, LI M. De novo peptide sequencing by deep learning[J]. Proceedings of the National Academy of Sciences, 2017, 114(31):8247-8252.
    [18] QIAO R, TRAN NH, XIN L, CHEN X, LI M, SHAN BZ, GHODSI A. Computationally instrument- resolution-independent de novo peptide sequencing for high-resolution devices[J]. Nature Machine Intelligence, 2021, 3(5):420-425.
    [19] VASWANI A, SHAZEER N, PARMAR N, USZKOREIT J, JONES L, GOMEZ AN, KAISER Ł, POLOSUKHIN I. Attention is all You need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York:ACM, 2017:6000-6010.
    [20] GILLIOZ A, CASAS J, MUGELLINI E, ABOU KHALED O. Overview of the transformer-based models for NLP tasks[C]//Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, Annals of Computer Science and Information Systems. September 6-9, 2020. IEEE, 2020:179-183.
    [21] SINGH S, MAHMOOD A. The NLP cookbook:modern recipes for transformer based deep learning architectures[J]. IEEE Access, 2021, 9:68675-68702.
    [22] HUANG XN, BI N, TAN J. Visual transformer-based models:a survey[M]//Pattern Recognition and Artificial Intelligence. Cham:Springer International Publishing, 2022:295-305.
    [23] BIAN ZD, LI SG, WANG W, YOU Y. Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. November 14-19, 2021, St. Louis, Missouri. New York:ACM, 2021:1-15.
    [24] RAJPURKAR P, ZHANG J, LOPYREV K, LIANG P. Squad:100000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:2383-2392.
    [25] ACHEAMPONG FA, NUNOO-MENSAH H, CHEN WY. Transformer models for text-based emotion detection:a review of BERT-based approaches[J]. Artificial Intelligence Review, 2021, 54(8):5789-5829.
    [26] KOROTEEV M.BERT:a review of applications in natural language processing and understanding[J]. arXiv preprint arXiv:2103.11943,2021.
    [27] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar. Stroudsburg, PA, USA:Association for Computational Linguistics, 2014:1746-1751.
    [28] CONSORTIUM TU. UniProt:a worldwide hub of protein knowledge[J]. Nucleic Acids Research, 2019, 47(D1):D506-D515.
    [29] SHI GB, KANG XY, DONG FY, LIU YC, ZHU N, HU YX, XU HM, LAO XZ, ZHENG H. DRAMP 3.0:an enhanced comprehensive data repository of antimicrobial peptides[J]. Nucleic Acids Research, 2022, 50(D1):D488-D496.
    [30] ABADI M. TensorFlow:learning functions at scale[C]//Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. September 18-24, 2016, Nara, Japan. New York:ACM, 2016:1.
    [31] PASZKE A, GROSS S, MASSA F, LERER A, BRADBURY J, CHANAN G, KILLEEN T, LIN Z, GIMELSHEIN N, ANTIGA L.Pytorch:an imperative style, high-performance deep learning library[J]. Advances in Neural Information Processing Systems, 2019, 32.
    [32] TENNEY I, DAS D, PAVLICK E. BERT rediscovers the classical NLP pipeline[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy. Stroudsburg, PA, USA:Association for Computational Linguistics, 2019:4593-4601.
    [33] CLARK K, KHANDELWAL U, LEVY O, MANNING CD. What does bert look at? An analysis of bert's attention[C]//Proceedings of the 2019 ACL Workshop BlackboxNLP:Analyzing and Interpreting Neural Networks for NLP. 2019:276-286.
    [34] TIAN S, QI P, XUE HJ, SUN Y. C-BERTT:a BERT-based model for extractive summarization of Chinese online judge questions[C]//2021 Ninth International Conference on Advanced Cloud and Big Data (CBD). March 26-27, 2022, Xi'an, China. IEEE, 2022:127-132.
    [35] JIANG XC, SONG C, XU YC, LI Y, PENG YL. Research on sentiment classification for netizens based on the BERT-BiLSTM-TextCNN model[J]. PeerJ Computer Science, 2022, 8:e1005.
    [36] LU DM. daminglu123 at SemEval-2022 task 2:using BERT and LSTM to do text classification[C]//Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Seattle, USA. Stroudsburg, PA, USA:Association for Computational Linguistics, 2022:186-189.
    [37] WANG B, KUO CC J. SBERT-WK:a sentence embedding method by dissecting BERT-based word models[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28:2146-2157.
    [38] SUN SQ, CHENG Y, GAN Z, LIU JJ. Patient knowledge distillation for BERT model compression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China. Stroudsburg, PA, USA:Association for Computational Linguistics, 2019:4323-4332.
    [39] CHOI H, KIM J, JOE S, GWON Y. Evaluation of BERT and ALBERT sentence embedding performance on downstream NLP tasks[C]//202025th International Conference on Pattern Recognition (ICPR). January 10-15, 2021, Milan, Italy. IEEE, 2021:5482-5487.
    [40] KIM T, YOO KM, LEE SG. Self-guided contrastive learning for BERT sentence representations[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Online. Stroudsburg, PA, USA:Association for Computational Linguistics, 2021:2528-2540.
    [41] REIMERS N, GUREVYCH I, REIMERS N, GUREVYCH I, THAKUR N, REIMERS N, DAXENBERGER J, GUREVYCH I, REIMERS N, GUREVYCH I. Sentence-BERT:sentence embeddings using Siamese BERT-Networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019:3982-3992.
    [42] JIANG ZY, TANG R, XIN J, LIN J. How does BERT rerank passages? An attribution analysis with information bottlenecks[C]//Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Punta Cana, Dominican Republic. Stroudsburg, PA, USA:Association for Computational Linguistics, 2021:496-509.
    [43] PAL K, PATEL BV. Data classification with k-fold cross validation and holdout accuracy estimation methods with 5 different machine learning techniques[C]//2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). March 11-13, 2020, Erode, India. IEEE, 2020:83-87.
    [44] van der VELDEN BHM, JANSE MHA, RAGUSI MAA, LOO CE, GILHUIJS KGA. Volumetric breast density estimation on MRI using explainable deep learning regression[J]. Scientific Reports, 2020, 10:18095.
    [45] OLIVEIRA M, TORGO L, SANTOS COSTA V. Evaluation procedures for forecasting with spatiotemporal data[J]. Mathematics, 2021, 9(6):691.
    [46] LIN TY, GOYAL P, GIRSHICK R, HE KM, DOLLÁR P. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017, Venice, Italy. IEEE, 2017:2999-3007.
    [47] LI WY, SEPAROVIC F, O'BRIEN-SIMPSON NM, WADE JD. Chemically modified and conjugated antimicrobial peptides against superbugs[J]. Chemical Society Reviews, 2021, 50(8):4932-4973.
    [48] MORETTA A, SCIEUZO C, PETRONE AM, SALVIA R, DARIO MANNIELLO M, FRANCO A, LUCCHETTI D, VASSALLO A, VOGEL H, SGAMBATO A, FALABELLA P. Antimicrobial peptides:a new hope in biomedical and pharmaceutical fields[J]. Frontiers in Cellular and Infection Microbiology, 2021, 11:668632.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐小放,杨春德,舒坤贤,袁新普,李默程,朱云平,陈涛. 基于BERT与Text-CNN的抗菌肽识别方法[J]. 生物工程学报, 2023, 39(4): 1815-1824

复制
分享
文章指标
  • 点击次数:336
  • 下载次数: 940
  • HTML阅读次数: 1014
  • 引用次数: 0
历史
  • 收稿日期:2022-11-04
  • 录用日期:2023-02-17
  • 在线发布日期: 2023-04-14
  • 出版日期: 2023-04-25
文章二维码
您是第5991367位访问者
生物工程学报 ® 2025 版权所有

通信地址:中国科学院微生物研究所    邮编:100101

电话:010-64807509   E-mail:cjb@im.ac.cn

技术支持:北京勤云科技发展有限公司