Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (5): 139-147.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.05.013

doi: 10.11871/jfdc.issn.2096-742X.2024.05.013

Previous Articles     Next Articles

Genome Variation Detection and Analysis Process

LUAN Haijing1,2(),NIU Beifang1,2,*()   

  1. 1. China Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. China University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2023-07-08 Online:2024-10-20 Published:2024-10-21

Abstract:

[Objective] This paper discusses the analysis workflow and limitations of genomic variation detection. [Scope of the literature] We collected and reviewed the related literature on the genomic variation detection workflow. [Methods] Firstly, a brief overview of the genomic variation detection analysis workflow is provided, followed by an in-depth discussion of the three key aspects of data quality control: raw data quality control, alignment quality control, and variant calling quality control. Data preprocessing will be introduced in terms of read alignment, sorting, and duplicate removal. Subsequently, the variation detection process are summarized, encompassing variant data monitoring, quality control, filtering, and annotation. Finally, an overview and prospects of the existing challenges are presented. [Results] With the advancement of next-generation sequencing technology, the genomic variation detection process is becoming more efficient, accurate, and scalable. [Limitations] Challenges include limitations in sequencing read length and the need for validation in clinical laboratory applications. [Conclusions] This workflow is of research significance for understanding the current status and developmental trends of genomic variation detection analysis.

Key words: data preprocessing, quality control, mutation detection, whole genome sequencing, mutation annotation