数据与计算发展前沿 ›› 2026, Vol. 8 ›› Issue (1): 119-128.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.010

doi: 10.11871/jfdc.issn.2096-742X.2026.01.010

• 技术与应用 • 上一篇    下一篇

农业科学数据自动挖掘框架设计与实践

蓝晨阳(),路长发,朱小杰*(),段军磊,任浩   

  1. 中国科学院计算机网络信息中心,北京 100083
  • 收稿日期:2025-04-09 出版日期:2026-02-20 发布日期:2026-02-02
  • 通讯作者: 朱小杰
  • 作者简介:蓝晨阳,中国科学院计算机网络信息中心,大数据技术与应用发展部科学数据软件体系实验室,硕士,工程师,主要研究方向为科学数据共享与大数据处理相关平台设计和关键技术研究。先后主持或参与了“大数据中台”“科学数据中心软件栈”“面向空间科学领域的大数据治理与集成服务技术研究”“一站式植被调查综合在线平台开发”等项目的研发设计和工程实施。
    在本文中,负责关键技术研究及论文部分章节的撰写。
    LAN Chenyang, holding a master’s degree, is an engineer at the Scientific Data Software System Lab of Department of Big Data Technology and Application Development, Computer Network Information Center, Chinese Academy of Sciences. His primary research focuses on scientific data sharing and the design of platforms related to big data processing, as well as the study of key technologies in this field. He has presented over or participated in the research, design, and implementation of several projects, including the “Big Data Middleware Platform,” “Scientific Data Center Software Stack,”“Big Data Governance and Integration Service Technologies for Space Science,” and the“Development of an Integrated Online Platform for One-Stop Vegetation Survey.”
    In this paper, he is responsible for key technology research and the writing of certain chapters.
    E-mail: cylan@cnic.cn|朱小杰,中国科学院计算机网络信息中心,大数据技术与应用发展部,硕士,高级工程师。主要研究方向为大数据管理、处理技术。目前承担有国家重点研发计划子课题、中国科学院网信专项课题。
    在本文中,负责学术指导、关键技术创新及论文终稿审定。
    ZHU Xiaojie, holding a master’s degree, is a senior engineer,at the Big Data Technology and Application Development Department, Computer Network Information Center, Chinese Academy of Sciences. Her main research direction is big data management and processing technology. She is undertaking a sub-project of the National key research and development plan.
    In this paper, she is responsible for academic supervision, key technological innovations, and final approval of the paper.
    E-mail: xjzhu@cnic.cn
  • 基金资助:
    国家重点研发计划“场景驱动的农业科学数据挖掘分析技术与应用”(2022YFF0711800)

Design and Practice of an Automated Mining Framework for Agricultural Science Data

LAN Chenyang(),LU Changfa,ZHU Xiaojie*(),DUAN Junlei,REN Hao   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2025-04-09 Online:2026-02-20 Published:2026-02-02
  • Contact: ZHU Xiaojie

摘要:

【背景】农业数字化转型促使大数据技术广泛应用,但传统数据处理方式仍面临处理流程复杂、传统工具集成困难等挑战。【目的】本文提出一种农业科学数据自动挖掘框架,通过智能化流水线架构PiFlow解决数据处理与应用场景的动态适配问题。【方法】融合流式处理引擎与有向无环图(DAG)任务编排技术,构建支持流批一体计算的异构数据流水线;采用模块化服务设计与容器化动态扩展机制,建立统一接口规范的组件抽象层,集成通用数据处理模块与领域专用挖掘工具;结合可视化交互引擎与预置算法模板库,实现复杂分析流程的低代码化构建。最后,基于六层子系统架构(处理流水线、执行引擎、调度、监控、日志及可视化引擎)构建原型系统。【结论】经作物基因组选择、耕地资源评价等场景验证,显著提升了多维度农业数据分析效率与跨场景复用性,为精准农业决策提供了可扩展的技术支撑。

关键词: 农业大数据, 异构数据流水线处理, 动态组件扩展, 可视化任务编排, 预置模板库

Abstract:

[Background] The digital transformation of agriculture has accelerated widespread adoption of big data technologies, yet conventional data processing approaches continue to grapple with challenges, including intricate workflow complexities and integration difficulties with legacy tools. [Objective] This study proposes an automated mining framework for agricultural science data employing intelligent pipeline architecture PiFlow to address dynamic adaptation between data processing and application scenarios. [Methods] The architecture integrates streaming engines with Directed Acyclic Graph (DAG) task orchestration to construct heterogeneous data pipelines supporting unified stream-batch computation. Utilizing modular service design and containerized elastic scaling mechanisms, it establishes a standardized operator abstraction layer with unified interfaces that incorporates both general-purpose processing operators and specialized mining tools. Through integration of visual interactive engines and predefined operator templates, the framework enables low-code development of complex analytical workflows. A prototype system was subsequently implemented using a six-layer subsystem architecture encompassing processing pipelines, execution engines, scheduling, monitoring, logging, and visualization components. [Results] Validation through agricultural genomics selection and arable land resource assessment demonstrates significant enhancements in multi-dimensional data analysis efficiency and cross-scenario reusability, establishing an extensible technical infrastructure for precision agricultural decision-making systems.

Key words: agricultural big data, heterogeneous data pipeline processing, elastic operator scaling, visualized task orchestration, preset templates library