Frontiers of Data and Computing ›› 2022, Vol. 4 ›› Issue (1): 42-52.

doi: 10.11871/jfdc.issn.2096-742X.2022.01.004

• Special Issue: Union of National Scientific Data Center • Previous Articles     Next Articles

Study on Omics Data Fusion Method of Heterogeneous Crops from Multiple Sources — Take Sorghum Bicolor as An Example

ZHANG Xianghe1,2(),YAN Shen1,2(),FAN Jingchao1,2,*()   

  1. 1. Agricultural Information Institute of CAAS, Beijing 100081, China
    2. National Agriculture Science Data Center, Beijing 100081, China
  • Received:2021-09-30 Online:2022-02-20 Published:2022-03-04
  • Contact: FAN Jingchao E-mail:zhxianghe@163.com;yanshen@caas.cn;fanjingchao@caas.cn

Abstract:

[Objective] Crop omics study is the trend of the research on agricultural crop science development. Under the background of data-intensive scientific research, crop omics data ehxibit the characteristics of large amounts, multiple sources, and complex structures. Fusion of multi-source heterogeneous crop omics data is beneficial to excavate germplasm resources of high-quality crops, support agricultural science and technology development. [Methods] Literature survey and network data collection were used to analyze the distribution and organization structure of crop omics data, and the main characteristics of multi-omics data resources were obtained. Taking sorghum bicolor as an example, the new standard metadata of multi-omics data was optimized by semantic analysis and literature query, and the script was developed to realize the mapping and transformation from metadata of different databases to the standard metadata. Then the fusion of multi-source data was realized based on metadata. By integrating a variety of bioinformatics methods such as mapping, mutation analysis, and DEG calculation, the heterogeneous omics data was fused. [Results] A multi-source heterogeneous omics data fusion method for sorghum bicolor was formed. It can realize the fusion of genome, transcriptome, metabolome, and phenotypic data in NCBI, EMBL, PlantGDB, National Data Center for Agricultural Sciences, and other databases. [Limitations] Targeted development of data sources and standard metadata is needed to meet the actual needs of promotion in other crops. [Conclusions] Based on metadata and bioinformatics methods, this paper develops a fusion method of crop multi-source heterogeneous omics data, which is universal and can be applied to other crop varieties.

Key words: omics data, multi-source isomerism, data fusion, sorghum bicolor