Scientific and Technology Resource Clustering Based on Domain Ontology

doi:10.11871/jfdc.issn.2096-742X.2020.05.002

Abstract

Abstract:

[Objective] Clustering can gather scattered, heterogeneous but related, similar scientific and technology resources into a multi-type resource pool, which makes resource discovery and utilization more efficient. This paper proposes a clustering method for massive high-dimensional scientific and technology resources based on domain ontology. [Methods] This method constructs the ontology tree and concept semantic relationship matrix in the field of science and technology resources, and uses the Principal Component Analysis (PCA) method to reduce the dimensions to construct the vector space, to which the K-means clustering method is applied eventually to obtain the clustering result. Compared with the traditional methods, this method has a stronger processing capacity for multi-source heterogeneous scientific and technology resource data. [Results] In this paper, the rational clustering results can be obtained by the proposed method on a certain biological germplasm resource library test. [Conclusions] In general, the clustering method of scientific and technology resources proposed in this paper has three characteristics: first, the use of ontology concept semantic relations to reduce the dimensionality, which effectively reduces the computational complexity; second, better maintenance of important scientific and technology resource feature information; and third, more accurate resource vector space and clustering results. The proposed clustering method solves the difficult problems in feature representation and poor clustering effect of multi-source heterogeneous scientific and technology resource data to a certain extent.

Key words: scientific and technology resources, heterogeneity, domain ontology, semantic relationship, clustering

Ge Yinchi,Zhang Hui,Song Wenyan,Wang Xuan. Scientific and Technology Resource Clustering Based on Domain Ontology[J]. Frontiers of Data and Computing, 2020, 2(5): 13-22.

Figures/Tables 10

Fig.1

Fig.2

Table 1

Fig.3

Table 2

Table 3

Fig.4

Fig.5

Table 4

Fig.6

References 22

[1]	赵启阳, 张辉, 王志强. 科技资源元数据标准研究的现状分析与新的视角[J]. 标准科学, 2019(03):12-17.
[2]	宫萍, 王理, 张辉, 魏思远, 王馨. 基于语义本体的科技资源集成建模研究[J]. 标准科学, 2019(03):36-40.
[3]	Lee J A, Verleysen M. Unsupervised dimensionality reduction: overview and recent advances[C]. The 2010 International Joint Conference on Neural Networks (IJCNN). IEEE, 2010: 1-8.
[4]	Jolliffe I T. Principal components in regression analysis[M]. Principal component analysis. Springer, New York, NY, 1986: 129-155.
[5]	Zou H, Hastie T, Tibshirani R. Sparse principal component analysis[J]. Journal of computational and graphical statistics, 2006,15(2):265-286. doi: 10.1198/106186006X113430
[6]	Kohonen T. Self-organized formation of topologically correct feature maps[J]. Biological cybernetics, 1982,43(1):59-69. doi: 10.1007/BF00337288
[7]	王志强, 杨青海. 科技资源管理标准体系研究[J]. 标准科学, 2019(03):6-11.
[8]	李小平, 徐汉川科技资源及服务集成与优化[J]. 中国基础科学, 201921(06):41-43+60+64.
[9]	赵男. 浅析如何做好科技资源共享[J]. 科技风, 2020(16):243.
[10]	杨子江. 科技资源内涵与外延探讨[J]. 科技管理研究, 2007(02):213-216.
[11]	国家科技基础条件平台建设战略研究组. 国家科技基础条件平台建设战略研究报告[M]. 北京: 科学技术文献出版社. 2006.
[12]	张渝英, 董诚, 王运红. 科技资源共享研究框架体系的探讨[J]. 现代科学仪器, 2007(05):3-9.
[13]	孔德洋. 我国科技资源共享问题探讨[J]. 中国科技资源导刊, 2008,40(06):51-56.
[14]	于阳. 科技资源信息相关集成方法研究[J]. 江苏科技信息, 2020,37(11):25-28.
[15]	李宗俊, 陈文杰. 区域科技服务资源集成与关联研究[J]. 中国科技资源导刊, 2019,51(06):1-5+58.
[16]	汤华茂, 郭钢. 云制造资源虚拟化描述模型及集成化智能服务模式研究[J]. 中国机械工程, 2016,27(16):2172-2178.
[17]	程臻. 云制造服务平台关键技术研究[D]. 哈尔滨工业大学, 2016.
[18]	唐琳, 郭崇慧, 陈静锋, 等. 基于中文学术文献的领域本体概念层次关系抽取研究[J]. 情报学报, 2020,39(4):387-398.
[19]	甘健侯, 姜跃, 夏幼明. 本体方法及其应用[M]. 北京: 科学出版社, 2011.
[20]	郝文宁, 冯波, 陈刚, 靳大尉, 赵水宁. 基于领域本体的文档向量空间模型构建[J]. 计算机应用研究, 2013,30(03):764-767.
[21]	孙荣, 刘宗田, 廖涛, 等. 应用本体对特征向量降维研究[J]. 计算机工程与设计, 2010 (17):3864-3867.
[22]	Tan P N, Steinbach M, Kumar V. 数据挖掘导论[M]. 北京: 机械工业出版社, 2019.

概念	主要概念1	主要概念2	主要概念3
#0	抗体	突变体斑马鱼	特色水生动物
#1	质粒	野生型斑马鱼	藻类和原生动物
#2	细胞系	转基因斑马鱼	长江鱼类
#3	分子和细胞工具	斑马鱼	珍稀水生动物
#4	突变体斑马鱼	水产细菌	水生植物

向量下标	概念	向量值
9	质粒	3.0
17	斑马鱼	2.0

类别	1	2	3	4	5	6
数量	1772	1260	424	102	26	22
解释	藻类和原生动物	斑马鱼	模式生物四膜虫	水产细菌	长江鱼类	质粒载体、鱼类细胞系
错误	8	1	1	1	1	2