National Key Technology R&D Program of China (No. 2017YFD0800204), the Fundamental Research Funds for the Central Universities (No. KYZ201600175).
In order to provide a theoretical basis for better understanding the function and properties of proteins, we proposed a simple and effective feature extraction method for protein sequences to determine the subcellular localization of proteins. First, we introduced sparse coding combined with the information of amino acid composition to extract the feature values of protein sequences. Then the multilayer pooling integration was performed according to different sizes of dictionaries. Finally, the extracted feature values were sent into the support vector machine to test the effectiveness of our model. The success rates in data set ZD98, CH317 and Gram1253 were 95.9%, 93.4% and 94.7%, respectively as verified by the Jackknife test. Experiments showed that our method based on multilayer sparse coding can remarkably improve the accuracy of the prediction of protein subcellular localization.
陈行健,胡雪娇,薛卫. 基于多层次稀疏编码预测蛋白质亚细胞定位[J]. Chinese Journal of Biotechnology, 2019, 35(4): 687-696
Copy® 2024 All Rights Reserved