Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (2): 31-39.

doi: 10.11871/jfdc.issn.2096-742X.2020.02.003

Special Issue: “数据分析技术与应用”专刊

• Special Issue: Data Analysis Technology & Application • Previous Articles     Next Articles

IA: An Interactive Analysis Service Management Engine in Scientific Data Cloud

Meng Zhen1,2,Wang Xuezhi1,2,Xie Zhimin3,Hu Lianglin1,2,Chen Zhiduan2,4,Ma Juncai2,5,Tong Jizhou2,6,Zhang Yanling7,*(),Zhou Yuanchun1,2,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
    3. Naval Military Marine Environment Construction Office,Beijing 100081, China
    4. Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
    5. Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
    6. National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
    7. Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, Henan 450001, China
  • Received:2020-01-07 Online:2020-04-20 Published:2020-06-03
  • Contact: Zhang Yanling,Zhou Yuanchun E-mail:zhangyanling@ztri.com.cn;zyc@cnic.cn

Abstract:

[Objective] With the development of scientific big data technology, problem-oriented analysis becomes normal case. Therefore, in views of the high cost of data migration and the reliance of data analysis on scientific big data, it is necessary to provide a scientific data analysis service engine in the data cloud, providing efficient extended computing and storage resources, optional algorithm resource libraries, and high-efficiency access interfaces with convenient user interaction tools and secure user access policy. Then, scientists can get rid of problems including large-scale data migration and adaptation to programming languages, algorithm environments, version issues, and resource calls, etc.. [Methods] An interactive analysis service management engine in scientific data cloud is presented. In our solution, resource nodes are scaled out through automatic registration. Resource nodes can be physical hosts or virtual hosts. When the utilization rate of computing resources reaches the threshold, the management node starts resource registration. Subsequently, a resource host is to be registered and the available container instances are added into the pool. The optional algorithm resource libraries, high-efficiency access interfaces for data resources and computing resources are versioned in the form of container mirrors for constructing the computing resource pools. The health of the container instance pool is maintained inside of the host. The instance lifecycle management is performed according to the maximum usage time and maximum silent time of each instance. With the always maintained fixed size resource pool, the container instance of the internal resource pool is in one out of four states, that is, preparing, ready, in use, and disappearing. There are several components set in the scientific analysis service system, including the proxy component, the orchestration module component, the user authentication component, the monitoring management component, buffer component, and a cache database. When a user accesses, the resources are conveyed according to the algorithm library selection and resource pool utilization rate, and a unique identity port (PID) is assigned for user access through proxy configuration. The access is in a secure encrypted network to interact with programming components or interactive application components that can use data and computing resources on the cloud. Each interactive component is in a separate container instance for effective resource isolation. [Results] Based on the interactive analysis service management engine in scientific data cloud, iAnalysis (IA for short), an interactive analysis cloud service system V1.0, gives a unified cloud resource management service for scientific data analysis. It can not only be used directly by end-user scientists through the IA's service portal, but also be called by other existing data systems in the form of docker container. By now, IA has provided several scientific cloud analysis services in the fields of life and health, ecological environment, meteorology, and hydrology, etc. It has been applied to major projects such as the Strategic Priority Research Program of the Chinese Academy of Sciences (both A and B) and the Major Project of the State Tobacco Monopoly Administration. It has also been applied to several National Scientific Data Centers, such as the National Microbial Science Data Center, National Space Science Data Center, and public platforms such as GSCloud (www.gscloud.cn) and DarwinTree (www.darwintree.cn). It also provides common coding tools for “R”, “TensorFlow”, “Data Science”, “All Spark”, and so on. Users can access the interactive programming component (iJupyter) or interactive application component (iWorkflow) through https to use data resources and computing resources of the data cloud.

Key words: analysis in data cloud, big data, container technology, IA, cloud services and management