欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2024, Vol. 41 ›› Issue (4): 468-476.DOI: 10.7523/j.ucas.2023.046

• 化学与生物学 • 上一篇    

基于卷积神经网络多尺度特征的大豆基因组表型预测

林昱彤, 王红, 柴团耀   

  1. 中国科学院大学生命科学学院, 北京 100049
  • 收稿日期:2023-01-30 修回日期:2023-05-05 发布日期:2023-06-12
  • 通讯作者: 王红,E-mail:hwang@ucas.ac.cn;柴团耀,E-mail:tychai@ucas.ac.cn
  • 基金资助:
    国家重点研发计划(2019YFA0903901)、中国科学院战略性先导科技专项A类项目(XDA24010402)、国家自然科学基金(61972374)和中央高校基本科研业务费专项资助

Multi-scale featured convolution neural network-based soybean phenotypic prediction

LIN Yutong, WANG Hong, CHAI Tuanyao   

  1. College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-01-30 Revised:2023-05-05 Published:2023-06-12

摘要: 在育种中,常常通过利用单核苷酸多态性(SNPs)来预测表型以辅助育种,提高育种效率。传统的统计分析方法受到数据缺失等诸多因素的限制,在一些情况下效果不佳。针对此问题,提出一种利用多尺度特征进行植物性状预测的卷积神经网络模型(MSF-CNN),该模型通过卷积提取3个不同尺度的SNPs特征,对植物性状数值进行回归预测,并通过对模型中SNPs的权重分析SNP位点的显著性。测试结果表明,与目前已知的其他方法相比,MSF-CNN模型在有基因型数据缺失值的数据集上表型预测的准确性更高。此外,通过显著性图研究基因型对性状的贡献,发现数个较显著的SNP位点。说明该深度学习模型可以更准确地预测定量表型,并能够高效识别与全基因组关联研究相关的SNP位点。

关键词: 遗传筛选, 深度学习, 全基因组关联分析, 大豆

Abstract: In breeding, single nucleotide polymorphisms (SNPs) in the genome are often used to predict quantitative phenotypes to assist breeding, thereby improving breeding efficiency. The traditional statistical analysis method is limited by many factors including missing data, and its performance sometimes can not meet the requirements. In this paper, we proposed a multi-scale feature convolutional neural network model (MSF-CNN) to predict plant traits. The model extracted SNP features at three different scales through convolution and analyzed the significance of SNP sites through the weight of the SNPs input into the model. The test results showed that MSF-CNN model performed with higher accuracy than the known methods and other deep learning models in phenotype prediction on the datasets with missing genotypic data. This paper also studied the contribution of genotype to traits through saliency map, and discovered several significant SNP loci. These results showed that, compared with other known methods available at present, the deep learning model proposed in this paper can obtain more accurate prediction results of quantitative phenotypes, and can also effectively and efficiently identify SNPs associated with genome-wide association research.

Key words: gene selection, deep learning, genome-wide association study, soybean

中图分类号: