Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (3): 68-80.

doi: 10.11871/jfdc.issn.2096-742X.2026.03.007

• Technology and Application • Previous Articles     Next Articles

A High-Quality Ocean Observation Profile Datasets Construction Scheme Based on Multi-Source Data Cleaning and Fusion

YUAN Huifeng1,3(),ZHU Yujing2,3,PAN Yuying2,ZHANG Rongwang4,*(),JIN Zhong1,3,*()   

  1. 1 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2 Institute of Atmospheric Physics, Chinese, Chinese Academy of Sciences, Beijing 100029, China
    3 University of Chinese Academy of Sciences, Beijing 100190, China
    4 South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, Guangdong 510301, China
  • Received:2025-08-25 Online:2026-06-20 Published:2026-06-18
  • Contact: ZHANG Rongwang,JIN Zhong E-mail:hfyuan@cnic.cn;rwzhang@scsio.ac.cn;zjin@sccas.cn

Abstract:

[Background] With the development of ocean observation technologies, various marine equipment and programs have emerged, propelling research in marine science into a “data-intensive”stage characterized by big data. [Objective] To integrate heterogeneous ocean observation data from diverse sources into a comprehensive and unified dataset, thereby enhancing holistic scientific capabilities in addressing marine research questions, this paper proposes a scheme for standardizing, annotating, and cleaning multi-source heterogeneous in situ ocean observation profile data to construct a high-quality ocean observation profile dataset. [Methods] Specifically, the scheme involves acquiring multi-source in-situ ocean observation profile data and corresponding metadata from several ocean data centers/agencies; applying a unique identifier derived from the raw data and descriptors to sequentially execute greylist filtering, multi-version filtering, and high-frequency observations filtering based on spatiotemporal characteristics, yielding refined ocean observation profile data; standardizing the processed data, followed by quality control and bias correction to construct a high-quality profile dataset. [Conclusions] This scheme promotes the application of multi-source heterogeneous profile data, improving data consistency, accuracy, and usability.

Key words: ocean big data, data clean, ocean observation, datasets