数据与计算发展前沿 ›› 2019, Vol. 1 ›› Issue (1): 94-104.doi: 10.11871/jfdc.issn.2096.742X.2019.01.010

所属专题: “数据与计算平台”专刊

• • 上一篇    下一篇

大数据3.0—— 后Hadoop时代大数据的核心技术

刘汪根,孙元浩   

  1. 星环信息科技(上海)有限公司,上海 200233
  • 收稿日期:2019-08-15 出版日期:2019-01-20 发布日期:2019-10-09
  • 作者简介:刘汪根,1985年生,星环信息科技(上海)有限公司研发总监、总架构师,目前负责星环科技大数据技术的前沿规划、研发管理工作,曾带领团队全球首个通过TPC-DS的基准测试。目前主要研究新一代的大数据架构、分布式数据库技术和容器云等领域,探索大数据PaaS化和服务化等工程技术。
    本文主要是对过去几年我们团队的大数据技术研发工作的一个总结,作者主要负责总体架构的设计、验证以及研发计划的落地,同时负责文章的主题撰写工作。
    Liu Wanggen , born in 1985, R&D director and chief architect of Transwarp. He is responsible for the strategic planning and R&D management of big data technology. He led the team developed the first product world-wide that passed the TPC-DS benchmark testing, certified officially. His research interests mainly focus on the new generation of big data architecture, distributed database technology, cloud computing, and engineering technologies such as data PaaS and other PaaS. This paper is summary of his team's big data technology R&D work over the past few years. He is responsible for the design, verification and development of the overall architecture, and is responsible for the topic writing of the article.
    E-mail:wayne.liu@transwarp.io|孙元浩,1976年生,星环信息科技(上海)有限公司创始人,从2009年开始布道大数据技术,并创立国内首个Hadoop发行版,2013年创立星环信息科技并开始新一代大数据技术的研发工作,负责星环科技的整体技术战略管理。本文中新一代大数据总体架构是孙元浩率先提出,并积极的通过推动了商业化的验证和落地。
    Sun Yuanhao , born in 1976, founder of Transwarp. He started his career in big data since 2009 and created the first Hadoop distribution in China. In 2013, he founded Transwarp and started the R&D of the new generation big data technology, responsible for the overall strategy of Transwarp. He took the lead in proposing the overall architecture of the new generation big data in this paper and actively promoted the validation as well as its commercialization.
    E-mail:yuanhao.sun@transwarp.io

Big Data 3.0—The Key Technologies of Big Data in Post-Hadoop Era

Wanggen Liu,Yuanhao Sun   

  1. Transwarp Technology (Shanghai) co, Ltd, Shanghai 200233, China
  • Received:2019-08-15 Online:2019-01-20 Published:2019-10-09

摘要:

【目的】以Hadoop为代表的第一代大数据技术架构存在过于复杂、性能不足,以及与云计算不能很好结合等问题,因此星环科技重新设计了大数据技术栈。【方法】设计了资源调度层来管理各种生命周期的服务和任务;抽象出了统一存储管理层,通过插拔不同的存储引擎来实现对不同类型数据的需求;通过统一的基于DAG的计算引擎来支持多种计算负载;在开发层提供标准的SQL和Python接口。【结果】使用Kubernetes技术统一管理数据服务和容器技术实现更好的多租户能力,打通大数据和业务之间的衔接,从而更好的实现数据业务化和业务数据化,也在大规模商用中得到了验证。【结论】通过对大数据架构的重新设计,不仅有效的解决了第一代大数据实现的技术问题,而且更好的与云计算和新型硬件技术结合,可以代表新一代大数据基础技术栈的发展方向。

关键词: 大数据, 云计算, DAG, 实时计算, Kubernetes, 多租户, 统一存储管理

Abstract:

[Objective] Since cloud computing and new hardware technology quickly adopted by industry, more and more users complain about the architect of Hadoop because of its property of high complexity, not mature nor stable, and not flexible for cloud computing. Transwarp redesigned the big data software stack in order to make users be able to use big data technology better and easier. [Methods] The new stack includes a new Resource Management and Scheduling layer, which can be able to manage tasks within different kinds of life cycle; a new Storage Management Layer which is able to add or remove different storage plugins for different data types and acts as a new distributed storage; a unified DAG-based computing engine which can be used for data warehouse, stream computing, graph computing and etc. A development interface supporting SQL and Python is designed for developers to reduce the coding complexity. [Results] Big data technology finally can work well with cloud computing by using Kubernetes for resource management. Besides, applications can work well with big data system software using these technologies on one unified platform. [Conclusions] After we refined the big data system stack, we not only solved the technical issues related to Hadoop, but also make big data system software works well with cloud computing and new hardware, which specifies the research direction of big data technology in the future.

Key words: big data, cloud computing, DAG, Stream computing, Kubernetes, multi-tenancy, unified storage engine