数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (6): 81-97.

doi: 10.11871/jfdc.10-1649.2021.06.006

• 专刊:科学大数据挖掘与知识发现 • 上一篇    下一篇

面向科学知识发现的造血干细胞知识图谱构建研究

胡正银1,2,*(),刘蕾蕾2(),陈文杰1(),刘春江1(),钱力2,3(),宋亦兵4   

  1. 1. 中国科学院成都文献情报中心,四川 成都 610041
    2. 中国科学院大学,经济与管理学院,图书情报与档案管理系,北京 100190
    3. 中国科学院文献情报中心,北京 100190
    4. 中国科学院广州生物医药与健康研究院,广东 广州 510530
  • 收稿日期:2021-11-10 出版日期:2021-12-20 发布日期:2022-01-26
  • 通讯作者: 胡正银
  • 作者简介:胡正银,中国科学院成都文献情报中心,知识系统部主任,中国科学院大学情报学硕士研究生导师,中国科学院西部之光人才培养计划人选,博士,研究馆员,合作出版专著(编著)3部、发表论文80余篇、申请计算机软件著作权8项。主要研究领域为科技大数据分析方法与技术、科技情报挖掘与知识发现。
    负责制定论文研究框架、负责构建HSC KG与负责SKD场景分析,撰写论文初稿。
    HU Zhengyin, Ph.D, is a professor of Chengdu Library and Information Center, Chinese Academy of Sciences, and master supervisor of University of Chinese Academy of Science, and a selected candidate of West Light Talent Program of the Chinese Academy of Sciences. He has published three monographs and more than 80 papers and has applied 8 computer software copyright cooperatively. His main research interests include S&T big data analysis, S&T intelligence mining and knowledge discovery.
    In this paper, he is responsible for research framework design, generating the data of HSC KG, and constructing the SKDsce-narios. E-mail: huzy@clas.ac.cn;|刘蕾蕾,中国科学院大学,情报学硕士研究生,研究方向为知识图谱、科技情报知识挖掘与知识发现。
    参与设计论文研究框架、参与构建HSC KG。
    LIU Leilei is a master’s student of Infor-mation and Archives Management, University of Chinese Aca-demy of Sciences. Her research interests include knowledge gra-ph, S&T intelligence mining, scientific knowledge discovery.
    In this paper, she is responsible for designing the research framework and generating the data of HSC KG. E-mail: liuleilei@mail.las.ac.cn;|陈文杰,中国科学院成都文献情报中心,馆员,从事知识挖掘、知识发现研究与知识服务平台建设,研究方向为机器学习、表示学习和知识图谱。
    参与SKD场景分析,开发HSC KG网站。
    CHEN Wenjie is a librarian of Chengdu Library and Infor-mation Center, Chinese Academy of Sciences. He has long been engaged in the research of knowledge mining, knowledge discovery and the construction of the knowledge service system. His research interests include machine learning, representation learning, and knowledge graph.
    In this paper, he is responsible for conducting the SKD scena-rios and developing the HSC KG website. E-mail: chenwj@clas.ac.cn;|刘春江,中国科学院成都文献情报中心,副研究馆员,研究方向为专利挖掘与知识发现。
    协助HSC KG构建和SKD场景分析。
    LIU Chunjiang is an associated professor of Chengdu Library and Information Center, Chinese Academy of Sciences. His research interests include patent mining and knowledge discovery.
    LIU Chunjiang helps to generate the HSC KG and design the SKD scenarios. E-mail: liucj@clas.ac.cn;|钱力,中国科学院文献情报中心,知识系统部主任,中国科学院大学情报学硕士研究生导师,博士,研究馆员。研究领域为科技大数据分析方法与技术、科技情报挖掘与智能知识服务。
    协助HSC KG构建和SKD场景分析。
    QIAN Li, Ph.D, is a professor of Library and Information Center, Chinese Academy of Sciences, master supervisor of University of Chinese Academy of Science. His research interests include S&T big data analysis, S&T intelli-gence mining, and intelligent knowledge service.
    QIAN Li helps to generate the HSC KG and design the SKD scenarios. E-mail: qianl@mail.las.ac.cn;|宋亦兵,中国科学院广州生物医药与健康研究院信息情报中心主任,高级工程师,研究领域为生命医学学科情报分析、生命医学知识服务。
    协助HSC KG构建,协助解读SKD分析结果。
    SONG Yibing, senior engineer, is the director of Information Center of Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences. His research interests include biomedical subject information analysis and biomedical know-ledge service.
    SONG Yibing helps to generate the HSC KG and interpret SKD analysis results. E-mail: song_yibing@gibh.ac.cn;
  • 基金资助:
    National Key Research and Development Program “Application demonstration of comprehensive science and technology services for typical industries in Pearl River Delta Urban Agglomeration”(Grant No:2018YFB1404205);the Ministry of Science and Technology Innovation Methods Special Project(Grant No: 2019IM020100)

Generating a Hematopoietic Stem Cell Knowledge Graph for Scientific Knowledge Discovery

HU Zhengyin1,2,*(),LIU Leilei2(),CHEN Wenjie1(),LIU Chunjiang1(),QIAN Li2,3(),SONG Yibing4   

  1. 1. Chengdu Library and Information Centre, Chinese Academy of Sciences, Chengdu, Sichuan 610041, China
    2. Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
    3. National Science Library, Chinese Academy of Sciences, Beijing 100190, China
    4. Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, Guangdong 510530, China
  • Received:2021-11-10 Online:2021-12-20 Published:2022-01-26
  • Contact: HU Zhengyin

摘要:

【目的】造血干细胞(HSC)是临床治疗最有效的干细胞之一,通过文献挖掘发现领域重要的知识实体、知识关系和知识路径对于HSC领域知识发现具有重要意义。知识图谱(KG)是一种新型知识组织技术,支持知识实体、知识关系和知识路径等知识单元的多层次、细粒度、富语义知识组织与知识互联,被广泛应用于科学知识发现(SKD)中。【方法】本文提出了一个基于“主-谓-宾”(SPO)三元组构建领域知识图谱的框架,该框架包括文献检索、SPO提取、SPO清洗、SPO排序、知识发现模式集成和图谱构建等过程。然后,基于该框架构建了HSC 知识图谱。最后,基于HSC知识图谱,介绍了“开放式知识发现”、“封闭式知识发现”与“研究主题挖掘”三种HSC领域SKD场景。【结果】结果表明,利用该框架构建的HSC知识图谱具有“使用图数据结构”、“集成知识发现模式”、“融合原生图挖掘算法”和“易于使用”等优点,可以有效地支持HSC领域知识发现。

关键词: 知识图谱, SPO三元组, 科学知识发现, 文献挖掘, 造血干细胞

Abstract:

[Objective] The hematopoietic stem cell (HSC) is one kind of the most effective stem cells for clinical treatments. It is of great significance to discover important knowledge entities, knowledge relations, and knowledge paths by literature mining for HSC knowledge discovery. Knowledge graph (KG), which represents knowledge entities and their relations with more details in a simple manner is widely used in scientific knowledge discovery (SKD).[Methods] This paper proposes a framework of generating KG using Subject-Predicate-Object (SPO) triples from literature, which includes six processes: literature retrieval, SPO extracting, SPO cleanup, SPO ranking, discovery pattern integrating, and graph building. Then, an HSC KG was constructed based on the Neo4j graph database following the framework. Finally, three kinds of SKD scenarios using HSC KG are introduced by empirical analysis. [Results] The results show that HSC KG has the advantages of “using graph data structure”, “integrating discovery patterns”, “fusing native graph mining algorithms”, and “easy to use”, which can effectively support deep open discovery, close discovery, and topic discovery in HSC.

Key words: knowledge graph, SPO triple, scientific knowledge discovery, literature mining, hematopoietic stem cell