Frontiers of Data and Computing ›› 2020, Vol. 2 ›› Issue (5): 13-22.

doi: 10.11871/jfdc.issn.2096-742X.2020.05.002

• Key Technology & Application of alodern Service Industry • Previous Articles     Next Articles

Scientific and Technology Resource Clustering Based on Domain Ontology

Ge Yinchi1(),Zhang Hui1(),Song Wenyan2,*(),Wang Xuan1()   

  1. 1. Beihang University, School of computer Science and Engineering, Beijing 100191, China
    2. Beihang University, School of economics and management, Beijing 100191, China
  • Received:2020-07-17 Online:2020-10-20 Published:2020-10-30
  • Contact: Song Wenyan E-mail:geyinchi@buaa.edu.cn;hzhang@buaa.edu.cn;songwenyan@buaa.edu.cn;songwenyan@buaa.edu.cn;837909408@qq.com

Abstract:

[Objective] Clustering can gather scattered, heterogeneous but related, similar scientific and technology resources into a multi-type resource pool, which makes resource discovery and utilization more efficient. This paper proposes a clustering method for massive high-dimensional scientific and technology resources based on domain ontology. [Methods] This method constructs the ontology tree and concept semantic relationship matrix in the field of science and technology resources, and uses the Principal Component Analysis (PCA) method to reduce the dimensions to construct the vector space, to which the K-means clustering method is applied eventually to obtain the clustering result. Compared with the traditional methods, this method has a stronger processing capacity for multi-source heterogeneous scientific and technology resource data. [Results] In this paper, the rational clustering results can be obtained by the proposed method on a certain biological germplasm resource library test. [Conclusions] In general, the clustering method of scientific and technology resources proposed in this paper has three characteristics: first, the use of ontology concept semantic relations to reduce the dimensionality, which effectively reduces the computational complexity; second, better maintenance of important scientific and technology resource feature information; and third, more accurate resource vector space and clustering results. The proposed clustering method solves the difficult problems in feature representation and poor clustering effect of multi-source heterogeneous scientific and technology resource data to a certain extent.

Key words: scientific and technology resources, heterogeneity, domain ontology, semantic relationship, clustering