数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (2): 91-100.doi: 10.11871/jfdc.issn.2096-742X.2020.02.007

• 专刊: 数据分析技术与应用 • 上一篇    下一篇

地学大数据处理架构与关键技术研究

张耀南1,2,3,*(),艾鸣浩1,2,康建芳1,2,3,敏玉芳1,2   

  1. 1. 中国科学院西北生态环境资源研究院, 甘肃 兰州 730000
    2. 国家冰川冻土沙漠科学数据中心, 甘肃 兰州 730000
    3. 甘肃资源环境科学数据工程技术研究中心, 甘肃 兰州 730000
  • 收稿日期:2020-01-16 出版日期:2020-04-20 发布日期:2020-06-03
  • 通讯作者: 张耀南 E-mail:yaonan@lzb.ac.cn
  • 作者简介:张耀南,现任中国科学院西北生态环境资源研究院大数据中心主任,国家冰川冻土沙漠科学数据中心主任。博士,研究员,博士生导师。主要研究方向为环境科学数据工程、基于高性能计算环境的地学模型模拟、遥感图像处理及多源数据融合。
    本文主要承担地学处理框架设计及整体项目应用实施。
    Zhang Yaonan, PH.D, is a professor and the dean of Big Data Center of Northwest Institute of Eco-Environment and Resources. His current research interests include integrated modeling environment, remote sensing image processing and multi-source heterogeneous data fusion.
    In this work, he is mainly responsible for the overall framework design and project implementation of earth data process.|艾鸣浩,中国科学院西北生态环境资源研究院,在读中国科学院大学博士,工程师。主要研究方向为多源数据集成、遥感数据处理及人工智能应用等工作。
    本文主要承担多源异构数据综合集成架构设计,论文编写。
    Ai Minghao, is an engineer and also a PH.D candidate in University of Chinese Academy of Sciences. His research interest include multi-source data integration, remote sensing data processing and the application of artificial intelligence.
    In this work, he is responsible for the design of multi-source heterogeneous data integration and paper writing.
    E-mail: aimh@lzb.ac.cn|康建芳,中国科学院西北生态环境资源研究院,工程师。负责地学大数据管理与分析等工作。
    本文主要承担数据关联方法设计与系统集成。
    Kang JanFang, an engineer in Big Data Center of Northwest Institute of Eco-Environment and Resources, is working on analysis and management of geoscience data.
    In this work, she is responsible for designing the geoscience data association method and system integration.
    E-mail: kjf@lzb.ac.cn|敏玉芳,中国科学院西北生态环境资源研究院,在读中国科学院大学博士,工程师。专注于地学大数据管理与分析研究及地学模型集成耦合等工作。
    本文主要承担数据-模型一体化架构设计与应用。
    Min Yufang, is an engineer and currently pursuing a Ph.D degree at Big Data Center of Northwest Institute of Eco-Environment and Resources, University of Chinese Academy of Sciences. Her main research interests include geoscience data management and geoscience model coupling method.
    In this work, she is responsible for the design of data & model integration.
    E-mail: myf@lzb.ac.cn
  • 基金资助:
    中国科学院信息化专项(XXH13506);国家科技基础条件平台建设(Y719H71)

Research on Geoscience Big Data Processing Framework and Key Techniques

Zhang Yaonan1,2,3,*(),Ai Minghao1,2,Kang Jianfang1,2,3,Min Yufang1,2   

  1. 1. Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, Gansu 730000, China
    2. National cryosphere Desert Scientific Data Center, Lanzhou, Gansu 730000, China
    3. Gansu Data Engineering ang Technology Research Center for Resource and Environment, Lanzhou, Gansu 73000, China
  • Received:2020-01-16 Online:2020-04-20 Published:2020-06-03
  • Contact: Yaonan Zhang E-mail:yaonan@lzb.ac.cn

摘要:

【目的】大数据以其独特的数据科学思维为地学研究知识发现带来重大机遇,但地学数据独特的多源异构、时空关联、多尺度和不确定性等特征亦给地学大数据处理带来一系列挑战。【方法】本文在分析地学数据特点基础上,结合数据关联、中间件系统、微服务及容器等技术手段,提出一种面向地学大数据的处理框架,重点解决地学领域多源数据汇集融合、异构数据综合集成处理问题,并将地学模型引入框架,增强数据处理的地学专业性。【结果】框架及其关键技术已在国家冰川冻土科学数据中心建设、高寒环境联合观测研究云及中巴走廊灾害数据集制备中应用实施。【结论】地学大数据平台处理框架拓宽数据处理维度,可为多主题、多尺度地学研究分析和知识发现提供支撑,未来框架将适应互联网、社交网络、平面媒体等更广泛来源的地学数据处理,进一步融合人工智能技术,提供更智能更迅捷的地学数据处理结果。

关键词: 地学大数据, 地学数据处理方法, 汇聚融合, 异构集成

Abstract:

[Objective] As one of the special data science methods, big data brings great opportunities for geological research. Meanwhile, the characteristics of geological data such as multi-source heterogeneity, spatial-temporal correlation, multi-scale and uncertainty bring great challenges for the data processing. [Methods] On the basis of a detailed analysis of aforementioned characteristics, this study proposes a geological data processing framework to solve problems of multi-source data integration and heterogeneous data synthesis in geoscience field combined with a variety of big data technologies like data association, middleware systems, micro services and container technique. Besides, geological models are embedded in this framework in order to improve the expertness of data process. [Results] The framework and its key technologies have been applied in the construction of the National Glacier and Frozen Soil Scientific Data Center, the disaster datasets for the China-Pakistan Corridor as well as the High and Cold Environment United Observation Cloud. [Conclusions] This study is expected to broaden the data processing dimension and support multi-theme, multi-scale research and knowledge discovery in geoscience. In future, it will be adapted to the processing of geological data from a wider range of sources such as the internet, social networks, and printed media. The integration of artificial intelligence technologies will enable the framework to provide smarter and faster geological data processing results.

Key words: earth scientific big data, geological data processing methods, data convergence, heterogeneous data integration