Abstract:Immune cell infiltration is of great significance for the diagnosis and prognosis of cancer. In this study, we collected gene expression data of non-small cell lung cancer (NSCLC) and normal tissues included in TCGA database, obtained the proportion of 22 immune cells by CIBERSORT tool, and then evaluated the infiltration of immune cells. Subsequently, based on the proportion of 22 immune cells, a classification model of NSCLC tissues and normal tissues was constructed using machine learning methods. The AUC, sensitivity and specificity of classification model built by random forest algorithm reached 0.987, 0.98 and 0.84, respectively. In addition, the AUC, sensitivity and specificity of classification model of lung adenocarcinoma and lung squamous carcinoma tissues constructed by random forest method 0.827, 0.75 and 0.77, respectively. Finally, we constructed a prognosis model of NSCLC by combining the immunocyte score composed of 8 strongly correlated features of 22 immunocyte features screened by LASSO regression with clinical features. After evaluation and verification, C-index reached 0.71 and the calibration curves of three years and five years were well fitted in the prognosis model, which could accurately predict the degree of prognostic risk. This study aims to provide a new strategy for the diagnosis and prognosis of NSCLC based on the classification model and prognosis model established by immune cell infiltration.