Frontiers of Data and Computing ›› 2019, Vol. 1 ›› Issue (1): 94-104.
Wanggen Liu,Yuanhao Sun
[Objective] Since cloud computing and new hardware technology quickly adopted by industry, more and more users complain about the architect of Hadoop because of its property of high complexity, not mature nor stable, and not flexible for cloud computing. Transwarp redesigned the big data software stack in order to make users be able to use big data technology better and easier. [Methods] The new stack includes a new Resource Management and Scheduling layer, which can be able to manage tasks within different kinds of life cycle; a new Storage Management Layer which is able to add or remove different storage plugins for different data types and acts as a new distributed storage; a unified DAG-based computing engine which can be used for data warehouse, stream computing, graph computing and etc. A development interface supporting SQL and Python is designed for developers to reduce the coding complexity. [Results] Big data technology finally can work well with cloud computing by using Kubernetes for resource management. Besides, applications can work well with big data system software using these technologies on one unified platform. [Conclusions] After we refined the big data system stack, we not only solved the technical issues related to Hadoop, but also make big data system software works well with cloud computing and new hardware, which specifies the research direction of big data technology in the future.
unified storage engine
Wanggen Liu,Yuanhao Sun. Big Data 3.0—The Key Technologies of Big Data in Post-Hadoop Era[J]. Frontiers of Data and Computing, 2019, 1(1): 94-104.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
The approach to make value from big data"
The new software stack for big data technology"
The logic view of development layer"
The architecture of SQL compiler"
The architecture comparison of MPP and DAG"
The architecture of stream computing engine"
A Sample of Transwarp StreamSQL"
The logic view of the storage management layer"
The architecture of the scheduling system"
How the application getting aware of the data topology"