Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (1): 119-128.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.010

doi: 10.11871/jfdc.issn.2096-742X.2026.01.010

• Technology and Application • Previous Articles     Next Articles

Design and Practice of an Automated Mining Framework for Agricultural Science Data

LAN Chenyang(),LU Changfa,ZHU Xiaojie*(),DUAN Junlei,REN Hao   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-04-09 Online:2026-02-20 Published:2026-02-02
  • Contact: ZHU Xiaojie E-mail:cylan@cnic.cn;xjzhu@cnic.cn

Abstract:

[Background] The digital transformation of agriculture has accelerated widespread adoption of big data technologies, yet conventional data processing approaches continue to grapple with challenges, including intricate workflow complexities and integration difficulties with legacy tools. [Objective] This study proposes an automated mining framework for agricultural science data employing intelligent pipeline architecture PiFlow to address dynamic adaptation between data processing and application scenarios. [Methods] The architecture integrates streaming engines with Directed Acyclic Graph (DAG) task orchestration to construct heterogeneous data pipelines supporting unified stream-batch computation. Utilizing modular service design and containerized elastic scaling mechanisms, it establishes a standardized operator abstraction layer with unified interfaces that incorporates both general-purpose processing operators and specialized mining tools. Through integration of visual interactive engines and predefined operator templates, the framework enables low-code development of complex analytical workflows. A prototype system was subsequently implemented using a six-layer subsystem architecture encompassing processing pipelines, execution engines, scheduling, monitoring, logging, and visualization components. [Results] Validation through agricultural genomics selection and arable land resource assessment demonstrates significant enhancements in multi-dimensional data analysis efficiency and cross-scenario reusability, establishing an extensible technical infrastructure for precision agricultural decision-making systems.

Key words: agricultural big data, heterogeneous data pipeline processing, elastic operator scaling, visualized task orchestration, preset templates library