数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (1): 42-52.

doi: 10.11871/jfdc.issn.2096-742X.2022.01.004

• 专刊:“国家科学数据中心联合”专刊 • 上一篇    下一篇

多源异构作物组学数据融合方法研究——以高粱为例

张翔鹤1,2(),闫燊1,2(),樊景超1,2,*()   

  1. 1.中国农业科学院农业信息研究所,北京 100081
    2.国家农业科学数据中心,北京 100081
  • 收稿日期:2021-09-30 出版日期:2022-02-20 发布日期:2022-03-04
  • 通讯作者: 樊景超
  • 作者简介:张翔鹤,中国农业科学院农业信息研究所,硕士研究生,主要研究方向为管理系统工程。
    本文主要承担资料收集和论文撰写工作。
    ZHANG Xianghe is currently a postgr-aduate student in the Agricultural Information Institute of Chi-nese Academy of Agricultural Sciences. Her research field inc-ludes management system engineering.
    In this paper, she is mainly responsible for data collection and paper writing. E-mail: zhxianghe@163.com|闫燊,中国农业科学院农业信息研究所,助理研究员,主要研究方向为农业信息管理与育种数据共享。
    本文主要承担论文总体框架,指导作物实体数据处理方法。
    YAN Shen is currently an assistant researcher of Agricultural Information Institute of Chinese Aca-demy of Agricultural Sciences. His research fields include agri-cultural information management and breeding data sharing.
    In this paper, he is mainly responsible for the framework design of the article and instruction of the crop entity data’s processing methods. E-mail: yanshen@caas.cn|樊景超,中国农业科学院农业信息研究所,副研究员,主要研究方向为农业科学数据管理。
    本文主要承担指导元数据标准的制定,提供了部分参考文献。
    FAN Jingchao is currently an associate researcher of Agricultural Information Institute of Chinese Academy of Agricultural Sciences. His research field includes agricultural science data management.
    In this paper, he is mainly responsible for the formulation of metadata standards and providing some references. E-mail: fanjingchao@caas.cn
  • 基金资助:
    内蒙古自治区科技重大专项“马铃薯产业链智慧管控关键技术研发与示范”(2021SZD0026)

Study on Omics Data Fusion Method of Heterogeneous Crops from Multiple Sources — Take Sorghum Bicolor as An Example

ZHANG Xianghe1,2(),YAN Shen1,2(),FAN Jingchao1,2,*()   

  1. 1. Agricultural Information Institute of CAAS, Beijing 100081, China
    2. National Agriculture Science Data Center, Beijing 100081, China
  • Received:2021-09-30 Online:2022-02-20 Published:2022-03-04
  • Contact: FAN Jingchao

摘要:

【目的】作物组学研究是农业作物科学发展的未来研究趋势,在数据密集型科学研究背景下,作物组学数据存在数据量大、来源多、结构复杂的特点,对多源异构作物组学数据的融合有利于优质作物种质资源的挖掘,助力农业科技发展。【方法】运用文献调查和网络数据收集法,对当前作物组学数据的分布和数据组织结构进行了分析,得出了多组学数据资源的主要特征;以高粱为例通过语义分析和文献查询方法,优化设计得到新的高粱多组学数据标准元数据,并开发脚本实现了不同数据库元数据到标准元数据的映射和转换,基于元数据实现了对多源数据的融合;通过整合mapping、变异分析、DEG计算等多种生物信息学方法,实现了对异构组学数据的融合。【结果】形成了高粱多源异构组学数据融合方法,能够实现对NCBI、EMBL、PlantGDB、国家农业科学数据中心等数据库中基因组、转录组、代谢组、表型组数据的融合。【局限】需进行数据源、标准元数据的针对性开发,以满足在其它作物中推广的实际需求。【结论】本文基于元数据和生物信息学方法,开发得到了作物多源异构组学数据的融合方法,具有普适性,可在其它作物品种中推广应用。

关键词: 组学数据, 多源异构, 数据融合, 高粱

Abstract:

[Objective] Crop omics study is the trend of the research on agricultural crop science development. Under the background of data-intensive scientific research, crop omics data ehxibit the characteristics of large amounts, multiple sources, and complex structures. Fusion of multi-source heterogeneous crop omics data is beneficial to excavate germplasm resources of high-quality crops, support agricultural science and technology development. [Methods] Literature survey and network data collection were used to analyze the distribution and organization structure of crop omics data, and the main characteristics of multi-omics data resources were obtained. Taking sorghum bicolor as an example, the new standard metadata of multi-omics data was optimized by semantic analysis and literature query, and the script was developed to realize the mapping and transformation from metadata of different databases to the standard metadata. Then the fusion of multi-source data was realized based on metadata. By integrating a variety of bioinformatics methods such as mapping, mutation analysis, and DEG calculation, the heterogeneous omics data was fused. [Results] A multi-source heterogeneous omics data fusion method for sorghum bicolor was formed. It can realize the fusion of genome, transcriptome, metabolome, and phenotypic data in NCBI, EMBL, PlantGDB, National Data Center for Agricultural Sciences, and other databases. [Limitations] Targeted development of data sources and standard metadata is needed to meet the actual needs of promotion in other crops. [Conclusions] Based on metadata and bioinformatics methods, this paper develops a fusion method of crop multi-source heterogeneous omics data, which is universal and can be applied to other crop varieties.

Key words: omics data, multi-source isomerism, data fusion, sorghum bicolor