Frontiers of Data and Computing ›› 2021, Vol. 3 ›› Issue (3): 95-110.

doi: 10.11871/jfdc.issn.2096-742X.2021.03.009

• Technology and Applicaton • Previous Articles     Next Articles

A Lightweight Remote Sensing Data Distributed Scheduling Framework DataboxMR

MENG Xianghai1,2(),WANG Xuezhi1,*(),ZHAO Jianghua1,2(),ZHOU Xiaohua1,2()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of the Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-01-04 Online:2021-06-20 Published:2021-07-09
  • Contact: WANG Xuezhi E-mail:mengxianghai@cnic.cn;wxz@cnic.cn;zjh@cnic.cn;zhouxiaohua@cnic.cn

Abstract:

[Objective] At present, processing remote sensing data based on a common platform not only improves processing efficiency, but also introduces problems such as low efficiency in executing lightweight tasks, high threshold for end users, and high migration costs. This paper aims to reduce the complexity faced by users in handling lightweight tasks as well as the migration cost caused by common platform processing, and improve the user's control over the task scheduling. [Methods] A lightweight distributed scheduling framework DataboxMR is proposed for efficient processing of remote sensing data. The framework is based on UDF (User-Defined Function) technology to design and implement the remote sensing data processing service component (Remote Sensing User-Defined Function, RS-UDF). RS-UDF enables users to package existing programs, custom functions, and references to existing mature processing technologies. It realizes synchronous and asynchronous calls in the form of interface services transfer. In addition, the framework achieves a remote sensing data scheduling engine (DataboxMR-Engine) based on a two-tier scheduling model, supporting designated node processing tasks, task division and distribution, and fault recovery functions.[Results] Experimental comparison with GeoTrellis, a remote sensing data processing tool based on memory computing, shows that DataboxMR is more efficient and has less system overhead when performing lightweight remote sensing data processing tasks. [Conclusions] DataboxMR is a lightweight and efficient distributed scheduling framework for remote sensing data.

Key words: remote sensing data, distributed scheduling framework, UDF technology, scheduling engine