数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (3): 126-135.

doi: 10.11871/jfdc.issn.2096-742X.2021.03.011

• 技术与应用 • 上一篇    下一篇

基于机器学习的基因组微卫星状态探测方法综述

张舒莹1,2(),韩鑫胤1,2(),何小雨1,2(),袁丹阳1,2(),栾海晶1,2(),李瑞琳1(),何佳茵1(),牛北方1,2,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100190
    2.中国科学院大学,北京 100049
  • 收稿日期:2021-01-21 出版日期:2021-06-20 发布日期:2021-07-09
  • 通讯作者: 牛北方
  • 作者简介:张舒莹,中国科学院计算机网络信息中心,在读硕士研究生,主要研究方向为癌症基因组学。
    本文中承担的任务是论文构思以及撰写。
    ZHANG Shuying is a master student at CNIC. She mainly focuses on cancer genome research.
    In this paper, she is mainly responsible for paper design and writing.
    E-mail: zhangshuying@cnic.cn|韩鑫胤,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为癌症基因组学。
    本文中承担的任务是论文修稿。
    HAN Xinyin is a Ph.D. student at CNIC. He is mainly engaged in cancer genome research.
    In this paper, he is mainly responsible for revising the paper.
    E-mail: hanxinyin@cnic.cn|何小雨,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是论文修稿。
    HE Xiaoyu is a Ph.D. student at CNIC. She is mainly engaged in high perfor-mance computing and cancer genomics.
    In this paper, she is mainly responsible for revising the paper.
    E-mail: hexy@sccas.cn|袁丹阳,中国科学院计算机网络信息中心,在读硕士研究生,主要致力于白血病相关生物信息学的研究。
    本文中承担的任务是论文修稿。
    YUAN Danyang is a master student at CNIC. She mainly focuses on leukemia related bioinformatics research.
    In this paper, she is mainly responsible for revising the paper.
    E-mail: yuandanyang@cnic.cn|栾海晶,中国科学院计算机网络信息中心,在读硕士研究生,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是论文修稿。
    LUAN Haijing is a master student at CNIC. She is mainly engaged in high perform-ance computing and cancer genomics.
    In this paper, she is mainly responsible for revising the paper.
    E-mail: luanhaijing@cnic.cn|李瑞琳,中国科学院计算机网络信息中心,博士,助理研究员,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是论文修稿。
    LI Ruilin, Ph.D., is an assistant research fellow at CNIC. She is mainly engaged in high performance computing and cancer genomics
    In this paper, she is mainly responsible for revising the paper.
    E-mail: lirl@sccas.cn|何佳茵,中国科学院计算机网络信息中心,硕士,助理工程师,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是论文修稿。
    HE Jiayin, M.D., is an assistant engineer at CNIC. She is mainly engaged in high performance computing and cancer genomics.
    In this paper, she is mainly responsible for revising the paper.
    E-mail: jiayin.he@cnic.cn|牛北方,中国科学院计算机网络信息中心,博士,研究员,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是研究指导,论文结构统筹。
    NIU Beifang, Ph.D., is a research fellow at CNIC. His activities mainly focus on high performance computing and cancer genomics.
    In this paper, he is mainly responsible for research guidance and overall planning of the paper structure.
    E-mail: niubf@cnic.cn
  • 基金资助:
    中国科学院战略性先导科技专项(B类)(XDB38040100)

Review of Genomic Microsatellite Status Detection Based on Machine Learning

ZHANG Shuying1,2(),HAN Xinyin1,2(),HE Xiaoyu1,2(),YUAN Danyang1,2(),LUAN Haijing1,2(),LI Ruilin1(),HE Jiayin1(),NIU Beifang1,2,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-21 Online:2021-06-20 Published:2021-07-09
  • Contact: NIU Beifang

摘要:

【目的】 探讨机器学习在基因组微卫星状态检测方法中的应用及未来研究方向。【文献范围】 本文收集了微卫星状态检测方法相关文献。【方法】 首先简要介绍微卫星状态检测的意义和常用的检测手段,其次对目前主流的基于机器学习的检测方法进行详细介绍,最后展望未来机器学习在微卫星状态检测领域中的研究方向。【结果】 基于机器学习的检测方法从大量测序数据中迭代学习,获取影响微卫星不稳定性的关键特征,该类检测方法可以取得较好的预测效果。【局限】 检测方法使用的数据类型各异,本文中无法使用同一数据集对各个检测方法进行实验比较。【结论】 机器学习已广泛应用于微卫星状态检测领域,提高检测方法的适用性以及从外周血样本中检测微卫星状态,是机器学习在此领域的未来研究方向。

关键词: 机器学习, 基因组, 微卫星不稳定性, 测序数据, 关键特征

Abstract:

[Objective] This paper discusses the application and future research direction of machine learning in microsatellite status detection. [Scope of the literature] We collected the related literature of microsatellite status detection methods.[Methods] Firstly, the significance of microsatellite status detection and common detection methods are briefly introduced. Secondly, the current mainstream detection methods based on machine learning are introduced in detail. Finally, perspective future research direction of machine learning in the field of microsatellite status detection is presented.[Results] The detection methods based on machine learning can iteratively learn from massive sequencing data and discern key features that affect microsatellite instability. They can achieve accurate prediction results. [Limitations] The data types used by the detection methods are different, so we cannot compare the detection methods within the same dataset. [Conclusions] Machine learning has been widely used in microsatellite status detection. Improving the applicability of detection methods and detecting microsatellite status from peripheral blood samples are the future research directions of machine learning in this field.

Key words: machine learning, genome, microsatellite instability, sequencing data, key features