Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (4): 96-105.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.008

doi: 10.11871/jfdc.issn.2096-742X.2024.04.008

• Special Issue: Fundamental Software Stack and Systems for National Scientific Data Centers • Previous Articles     Next Articles

Study on Integration Method of Algorithm Model Based on Big Data Pipeline— Taking Tree Biomass Inversion Based on Machine Learning Method and LiDAR Data as an Example

GUO Xuebing1,2,*(),ZHU Xiaojie3,TANG Xinzhai1,YANG Gang3,HOU Yanfei1,2,HE Honglin1,2   

  1. 1. Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
    2. National Ecosystem Science Data Center, Beijing 100101, China
    3. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2024-02-29 Online:2024-08-20 Published:2024-08-20

Abstract:

[Background] Light Detection and Ranging (LiDAR) data are widely used in the analysis and utilization of forest resources. Researchers have developed many professional algorithm models involving big data management and artificial intelligence. Currently, most of these algorithm models are scattered in the hands of researchers, and there is still a lack of new information platforms to integrate them. [Methods] The big data pipeline system such as πFlow has the capability of big data management and big data algorithm integration, and can build and schedule the pipeline in the way of WYSIWYG (what you see is what you get). It is suitable for integration of complex algorithm models for LiDAR data, and the pipeline can be customized and reused. [Contents] This paper introduces the characteristics and functions of πFlow, taking tree crown segmentation and estimation of tree biomass using machine learning methods based on LiDAR tree canopy height model (CHM) data as an example. The paper presents the method and technology of integrating algorithms into πFlow, constructs a LiDAR data analysis and processing pipeline, and conducts test operations to the pipeline. [Results] The reproducible information platform constructed using πFlow could support fast biomass inversion of LiDAR data for multiple networked observational field sites, which can also provide an innovative technological method for the integration of data-intensive processing algorithm models.

Key words: big data pipeline, algorithm model integration, LiDAR, machine learning, random forest classify, πFlow