数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (3): 95-110.

doi: 10.11871/jfdc.issn.2096-742X.2021.03.009

• 技术与应用 • 上一篇    下一篇

轻量级遥感数据分布式调度框架DataboxMR

孟祥海1,2(),王学志1,*(),赵江华1,2(),周小华1,2()   

  1. 1.中国科学院计算机网络信息中心,北京 100190
    2.中国科学院大学,北京 100049
  • 收稿日期:2021-01-04 出版日期:2021-06-20 发布日期:2021-07-09
  • 通讯作者: 王学志
  • 作者简介:孟祥海,中国科学院计算机网络信息中心,在读硕士研究生,主要研究领域为海量数据处理技术及应用、遥感数据分布式处理等。
    本文中承担的任务是实验设计与文献撰写。
    MENG Xianghai is a postgraduate student of Computer Network Information Center, Chinese Academy of Sciences. His main research areas are massive data processing technology and appli-cations, and distributed remote sensing data processing.
    In this paper, he is responsible for experimental design and paper writing.
    E-mail: mengxianghai@cnic.cn|王学志,中国科学院计算机网络信息中心,研究员,硕士生导师。主要研究方向为科学大数据管理和应用技术、时空大数据分析处理、科学大数据挖掘分析等。主持参与了环境保护部和中国科学院联合重大项目、火炬计划项目、国家重点研发计划子课题、环保公益性行业科研专项、院十二五信息化项目子课题、院战略性先导科技专项子课题等多个项目。
    本文中负责研究指导和总体统稿。
    WANG Xuezhi is a senior researcher and master’s supervisor of Computer Network Information Center, Chinese Academy of Sciences. The main research directions are scientific big data management and application technology, spatio-temporal big data analysis and processing, scientific big data mining analysis, etc. He has Presided over and participated in the joint major projects of the Ministry of Environmental Protection and the Chinese Academy of Sciences, the Torch Program project, the sub-projects of the National Key R&D Program, the spe-cial scientific research projects of environmental protection public welfare industries, the sub-projects of the 12th Five-Year Informatization Project of the Academy, the special sub-projects of the Academy’s strategic leading technology projects, etc.
    In this paper, he is responsible for research guidance and overall draft.
    E-mail: wxz@cnic.cn|赵江华,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为遥感数据处理与分析。
    本文中主要承担的任务是论文指导。
    ZHAO Jianghua is a PhD candidate at the Computer Network Information Center, Chinese Academy of Sciences. Her main research direction is remote sensing data processing and analysis.
    In this paper, she is responsible for thesis guidance.
    E-mail: zjh@cnic.cn|周小华,中国科学院计算机网络信息中心,在读博士研究生,主要研究方向为遥感数据处理与分析。
    本文中主要承担的任务是实验指导。
    ZHOU Xiaohua is a PhD candidate at the Computer Network Information Center, Chinese Academy of Sciences. His main research direction is remote sensing data processing and analysis.
    In this paper, he is responsible for experimental guidance.
    E-mail: zhouxiaohua@cnic.cn
  • 基金资助:
    中国科学院前沿科学重点研究计划项目(ZDBS-LY-DQC016)

A Lightweight Remote Sensing Data Distributed Scheduling Framework DataboxMR

MENG Xianghai1,2(),WANG Xuezhi1,*(),ZHAO Jianghua1,2(),ZHOU Xiaohua1,2()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of the Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-04 Online:2021-06-20 Published:2021-07-09
  • Contact: WANG Xuezhi

摘要:

【目的】 目前,基于通用平台处理遥感数据,在提升处理效率的同时,也带来了执行轻量级任务效率低、用户使用门槛高、迁移代价大的问题。为降低用户处理轻量级任务的复杂度,降低基于通用平台处理带来的迁移代价,提高用户对任务调度端的控制力。【方法】 本文提出了一个高效处理遥感数据的轻量级分布式调度框架(DataboxMR)。框架基于UDF(User-Defined Function)技术设计实现了遥感数据处理服务组件(Remote Sensing User-Defined Function, RS-UDF),RS-UDF支持用户已有程序封装、自定义函数封装和引用已有成熟处理技术,通过接口服务的形式实现同步调用和异步调用。此外,框架基于双层调度模式设计遥感数据调度引擎(DataboxMR-Engine),支持指定节点处理任务,支持任务划分和分发及故障恢复等功能。【结果】 与基于内存计算的遥感数据处理工具GeoTrellis进行实验对比,结果表明,执行轻量级遥感数据处理任务时,DataboxMR效率更高,系统开销更小。【结论】 DataboxMR是一个轻量高效的遥感数据分布式调度框架。

关键词: 遥感数据, 分布式调度框架, UDF技术, 调度引擎

Abstract:

[Objective] At present, processing remote sensing data based on a common platform not only improves processing efficiency, but also introduces problems such as low efficiency in executing lightweight tasks, high threshold for end users, and high migration costs. This paper aims to reduce the complexity faced by users in handling lightweight tasks as well as the migration cost caused by common platform processing, and improve the user's control over the task scheduling. [Methods] A lightweight distributed scheduling framework DataboxMR is proposed for efficient processing of remote sensing data. The framework is based on UDF (User-Defined Function) technology to design and implement the remote sensing data processing service component (Remote Sensing User-Defined Function, RS-UDF). RS-UDF enables users to package existing programs, custom functions, and references to existing mature processing technologies. It realizes synchronous and asynchronous calls in the form of interface services transfer. In addition, the framework achieves a remote sensing data scheduling engine (DataboxMR-Engine) based on a two-tier scheduling model, supporting designated node processing tasks, task division and distribution, and fault recovery functions.[Results] Experimental comparison with GeoTrellis, a remote sensing data processing tool based on memory computing, shows that DataboxMR is more efficient and has less system overhead when performing lightweight remote sensing data processing tasks. [Conclusions] DataboxMR is a lightweight and efficient distributed scheduling framework for remote sensing data.

Key words: remote sensing data, distributed scheduling framework, UDF technology, scheduling engine