Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (2): 156-164.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.02.014

doi: 10.11871/jfdc.issn.2096-742X.2024.02.014

• Technology and Application • Previous Articles     Next Articles

Evaluation and Analysis of Data Volume Index Coupled with Heterogeneity in Scientific Data Centers

GAO Mengxu1(),WANG Yueyue2,*(),WU Xinqian2,CHEN Zugang3,4,SHI Lei1,WANG Ruidan1   

  1. 1. National Science and Technology Infrastructure, Beijing 100038, China
    2. School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, Henan 471000, China
    3. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
    4. National Earth Observation Data Center, National Science and Technology Infrastructure, Beijing 100094, China
  • Received:2023-08-23 Online:2024-04-20 Published:2024-04-26


[Objective] Data volume is an important index to measure the resource integration and service capability of scientific data centers. However, due to the different disciplines, levels, subordinate systems, construction time, and other backgrounds of scientific data centers, it is obviously unfair to directly compare data volume among data centers. [Methods] Based on the data resources volume collected by scientific data centers and the related science data centers heterogeneity factors data with the help of public service platforms such as “China Science and Technology Resource Sharing Net” and “Science Data Center of CAS”, this article realizes the quantification of heterogeneous factors and analysis of their impact on the data volume by using dummy variables and correlation analysis. The data volume panel model coupled with the heterogeneity of scientific data centers was constructed by using hypothesis testing, the least square virtual variable method, and other statistical methods. [Results] The proposed model eliminates the difference of data volume index caused by the heterogeneity of scientific data centers and realizes the horizontal differential comparative study of data volume among various types of scientific data centers. [Conclusions] The heterogeneous adjustment method for the data volume of scientific data centers is proposed for the first time in this study, which has important references for systematic and scientific evaluation of scientific data centers.

Key words: scientific data centers, heterogeneity, data volume, panel data model, least squares virtual variables