数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (5): 139-147.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.05.013

doi: 10.11871/jfdc.issn.2096-742X.2024.05.013

• • 上一篇    下一篇

基因组变异检测分析流程

栾海晶1,2(),牛北方1,2,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100049
  • 收稿日期:2023-07-08 出版日期:2024-10-20 发布日期:2024-10-21
  • 通讯作者: * 牛北方(E-mail: bniu@sccas.cn
  • 作者简介:栾海晶,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是论文构思及撰写。
    LUAN Haijing is a Ph. D. student at the Computer Network Information Center of the Chinese Academy of Sciences. She is mainly engaged in high-performance computing and cancer genomics.
    In this paper, she is mainly responsible for paper design and writing.
    E-mail: luanhaijing@cnic.cn|牛北方,中国科学院计算机网络信息中心,博士,研究员,主要研究方向为高性能计算和癌症基因组学。
    本文中承担的任务是研究指导,论文结构统筹。
    NIU Beifang, Ph.D, is a research fellow at the Computer Network Information Center of the Chinese Academy of Sciences. His activities mainly focus on high-performance computing and cancer genomics.
    In this paper, he is mainly responsible for research guidance and overall planning of the paper structure.
    E-mail: niubf@cnic.cn
  • 基金资助:
    国家自然科学基金(92259101);中国科学院战略性先导科技专项(B类)(XDB38040100)

Genome Variation Detection and Analysis Process

LUAN Haijing1,2(),NIU Beifang1,2,*()   

  1. 1. China Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. China University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-07-08 Online:2024-10-20 Published:2024-10-21

摘要:

【目的】探讨基因组变异检测的分析流程及其局限性。【文献范围】本文收集并综述了与基因组变异检测流程相关的研究文献。【方法】首先简要概述了基因组变异检测分析流程,深入介绍数据质控的3个关键环节:原始数据质量控制、比对质量控制和变异调用质量控制。接着,从read比对、排序和去除重复序列三方面介绍数据预处理。随后,针对变异检测,从变异数据检测、质控、过滤和注释4个方面进行总结。最后,对存在的问题进行总结和展望。【结果】随着下一代测序技术的发展,基因组变异检测流程将变得更加高效、精确且具可扩展。【局限】面临测序长度限制和在临床实验室中的应用验证等挑战。【结论】该流程对了解基因组变异检测分析流程的现状及其发展趋势具有重要的研究意义。

关键词: 数据预处理, 质量控制, 变异检测, 全基因组测序, 变异注释

Abstract:

[Objective] This paper discusses the analysis workflow and limitations of genomic variation detection. [Scope of the literature] We collected and reviewed the related literature on the genomic variation detection workflow. [Methods] Firstly, a brief overview of the genomic variation detection analysis workflow is provided, followed by an in-depth discussion of the three key aspects of data quality control: raw data quality control, alignment quality control, and variant calling quality control. Data preprocessing will be introduced in terms of read alignment, sorting, and duplicate removal. Subsequently, the variation detection process are summarized, encompassing variant data monitoring, quality control, filtering, and annotation. Finally, an overview and prospects of the existing challenges are presented. [Results] With the advancement of next-generation sequencing technology, the genomic variation detection process is becoming more efficient, accurate, and scalable. [Limitations] Challenges include limitations in sequencing read length and the need for validation in clinical laboratory applications. [Conclusions] This workflow is of research significance for understanding the current status and developmental trends of genomic variation detection analysis.

Key words: data preprocessing, quality control, mutation detection, whole genome sequencing, mutation annotation