Frontiers of Data and Computing ›› 2019, Vol. 1 ›› Issue (1): 94-104.doi: 10.11871/jfdc.issn.2096.742X.2019.01.010

Special Issue: “数据与计算平台”专刊

Previous Articles     Next Articles

Big Data 3.0—The Key Technologies of Big Data in Post-Hadoop Era

Wanggen Liu,Yuanhao Sun   

  1. Transwarp Technology (Shanghai) co, Ltd, Shanghai 200233, China
  • Received:2019-08-15 Online:2019-01-20 Published:2019-10-09

Abstract:

[Objective] Since cloud computing and new hardware technology quickly adopted by industry, more and more users complain about the architect of Hadoop because of its property of high complexity, not mature nor stable, and not flexible for cloud computing. Transwarp redesigned the big data software stack in order to make users be able to use big data technology better and easier. [Methods] The new stack includes a new Resource Management and Scheduling layer, which can be able to manage tasks within different kinds of life cycle; a new Storage Management Layer which is able to add or remove different storage plugins for different data types and acts as a new distributed storage; a unified DAG-based computing engine which can be used for data warehouse, stream computing, graph computing and etc. A development interface supporting SQL and Python is designed for developers to reduce the coding complexity. [Results] Big data technology finally can work well with cloud computing by using Kubernetes for resource management. Besides, applications can work well with big data system software using these technologies on one unified platform. [Conclusions] After we refined the big data system stack, we not only solved the technical issues related to Hadoop, but also make big data system software works well with cloud computing and new hardware, which specifies the research direction of big data technology in the future.

Key words: big data, cloud computing, DAG, Stream computing, Kubernetes, multi-tenancy, unified storage engine