数据与计算发展前沿 ›› 2019, Vol. 1 ›› Issue (1): 73-81.doi: 10.11871/jfdc.issn.2096.742X.2019.01.008

所属专题: “数据与计算平台”专刊

• • 上一篇    下一篇

联邦型RDF数据管理系统综述

彭鹏1,邹磊2   

  1. 1.湖南大学,信息科学与工程学院,湖南 长沙 410082
    2.北京大学,王选计算机研究所,北京 100080
  • 收稿日期:2019-08-30 出版日期:2019-01-20 发布日期:2019-10-09
  • 作者简介:彭鹏,1987年生,湖南大学信息科学与工程学院,助理教授,博士,主要研究方向为基于图的分布式知识图谱数据管理。
    本文主要贡献:文献调研。
    Peng Peng, born in 1988, is currently an Assistant Professor in College of Computer Science and Electronic Engineering at Hunan University. He received the Ph.D. degree in Computer Science from Peking University in 2016. His research interest includes graph-based management on distributed knowledge-graph database.
    In this paper he is mainly responsible for literature research.
    E-mail:hnu16pp@hnu.edu.cn|邹磊,1981年生,北京大学王选计算机研究所,教授,博士,主要研究方向为图数据库与语义数据管理。
    本文主要贡献:背景介绍。
    Zou Lei, born in 1981, is currently a Professor in Wangxuan Institute of Computer Technology at Peking University. He received the Ph.D. degree in Computer Science from Huazhong University of Science and Technology in 2009. His research interests include graph database and semantic data management.
    In this paper he is mainly responsible for the introduction of background.
    E-mail:zoulei@pku.edu.cn
  • 基金资助:
    国家重点研发计划“科学大数据管理系统(面向特定领域的大数据管理系统)”(2016YFB1000603);国家自然科学基金(61702171);国家自然科学基金(61622201);国家自然科学基金(61532010)

Survey on Federated RDF Systems

Peng Peng1,Lei Zou2   

  1. 1.College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
    2.Wangxuan Institute of Computer Technology, Peking University, Beijing 100080, China
  • Received:2019-08-30 Online:2019-01-20 Published:2019-10-09

摘要:

【目的】资源描述框架(Resource Description Framework,英文简写RDF)作为一个知识表示的模型,已经被广泛地用在各种科学数据管理的应用中来表示知识图谱。同时,SPARQL(Simple Protocol And RDF Query Language)作为一种结构化查询语言则被用来支持对RDF知识图谱数据进行查询检索。随着越来越多的数据提供者将他们的数据表示成RDF知识图谱形式,如何将不同数据提供者“自治”的RDF知识图谱数据整合成一个“联邦型RDF数据管理系统”就成为一个挑战。【文献范围】本文对现有不同的联邦型RDF数据管理系统进行综述。【方法】不同联邦型RDF数据管理系统之间主要的区别体现在查询分解与数据源选择策略以及查询处理与优化策略。【结果】目前联邦型RDF数据管理系统的查询分解与数据源选择策略可以分成基于元数据的策略和基于ASK查询的策略;而联邦型RDF数据管理系统的查询处理与优化策略是在System-R 式动态规划的基础上提出了若干优化连接策略。【局限】目前联邦型RDF数据管理系统尚未研究如何支持SPARQL 1.1。【结论】联邦型RDF数据管理系统可以支持分布在多数据源知识图谱数据的整合,是未来知识图谱数据管理的一个重要研究方向。

关键词: 联邦型RDF数据管理系统, SPARQL查询处理, 查询优化

Abstract:

[Objective] Resource Description Framework (RDF), a standard model for knowledge representation, has been widely used in various scientific data management applications to represent the scientific data as a knowledge graph. Meanwhile, Simple Protocol And RDF Query Language (SPARQL) is a structured query language to access RDF repository. As more and more data publishers release their datasets in the model of RDF, how to integrate the RDF datasets provided by different data publishers into a federated RDF system becomes a challenge. [Coverage] In this paper we provide an overview of the studies of federated RDF systems. [Methods]The major differences among different federated RDF systems are different strategies for source selection guided query decomposition and query processing optimization. [Results] Existing query decomposition and source selection strategies in federated RDF systems can be divided into two categories: metadata-based and ASK-based strategies; Query optimization strategies in existing federated RDF systems are some joint optimizations based on System-R style dynamic programming. [Limitations] Existing federated RDF systems still do not discuss how to support SPARQL 1.1. [Conclusions] Federated RDF systems can integrate distributed RDF graphs among different sources, which means that it is an important future research direction.

Key words: Federated RDF Systems, SPARQL query evaluation, query optimization