数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (2): 156-164.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.02.014

doi: 10.11871/jfdc.issn.2096-742X.2024.02.014

• 技术与应用 • 上一篇    下一篇

耦合异构性的科学数据中心数据总量指标评价分析

高孟绪1(),王悦悦2,*(),武新乾2,陈祖刚3,4,石蕾1,王瑞丹1   

  1. 1.国家科技基础条件平台中心,北京 100038
    2.河南科技大学数学与统计学院,河南 洛阳 471000
    3.中国科学院空天信息创新研究院,北京 100094
    4.国家对地观测科学数据中心,北京 100094
  • 收稿日期:2023-08-23 出版日期:2024-04-20 发布日期:2024-04-26
  • 通讯作者: *王悦悦(E-mail: yuelnt@163.com
  • 作者简介:高孟绪,国家科技基础条件平台中心,研究员,主要研究方向为科技资源管理与共享。
    本文中主要负责研究方案的提出和分析。
    GAO Mengxu is a professor of National Science and Technology Infrastructure. His main research interests include science and technology resource management and sharing.
    In this paper, he is responsible for the proposal and analysis of the research program.
    E-mail: gaomx@most.cn|王悦悦,河南科技大学数学与统计学院,硕士研究生,主要研究方向为科技资源管理与平台评价研究。
    本文中主要负责实验和论文撰写。
    WANG Yueyue is a master’s student at the School of Mathematics and Statistics, Henan University of Science and Technology. Her main research interests include Science and technology resource management and platform evaluation.
    In this paper, she is responsible for the experiments and manuscript drafting.
    E-mail: yuelnt@163.com
  • 基金资助:
    国家自然科学基金面上项目“基于动态与异构场景的科学数据中心评价方法研究”(72074017)

Evaluation and Analysis of Data Volume Index Coupled with Heterogeneity in Scientific Data Centers

GAO Mengxu1(),WANG Yueyue2,*(),WU Xinqian2,CHEN Zugang3,4,SHI Lei1,WANG Ruidan1   

  1. 1. National Science and Technology Infrastructure, Beijing 100038, China
    2. School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, Henan 471000, China
    3. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
    4. National Earth Observation Data Center, National Science and Technology Infrastructure, Beijing 100094, China
  • Received:2023-08-23 Online:2024-04-20 Published:2024-04-26

摘要:

【目的】数据总量指标是衡量科学数据中心资源整合与服务能力的重要指标,然而由于科学数据中心具有不同的学科、级别、隶属系统、建设时长等背景,直接通过比较数据中心间的数据总量大小来判别数据中心的数据资源整合能力明显有失公允。【方法】本研究基于科学数据中心统计的数据资源总量数据,借助“中国科技资源共享网”“中国科学院科学数据中心”等公共服务平台收集了相关科学数据中心的异构性因素数据,利用虚拟变量、相关性分析等方法实现了异构性因素的量化及其对数据总量的影响分析,采用假设检验、最小二乘虚拟变量法等统计方法构建了耦合科学数据中心异构性的数据总量面板数据模型。【结果】消除了数据总量指标由于科学数据中心异构性造成的差异,实现了多种类型科学数据中心间数据总量的横向差异化比较研究。【结论】本研究首次提出了针对科学数据中心数据资源总量的异构性平差方法,对于开展科学数据中心的系统性和科学性评价具有重要借鉴。

关键词: 科学数据中心, 异构性, 数据总量, 面板数据模型, 最小二乘虚拟变量法

Abstract:

[Objective] Data volume is an important index to measure the resource integration and service capability of scientific data centers. However, due to the different disciplines, levels, subordinate systems, construction time, and other backgrounds of scientific data centers, it is obviously unfair to directly compare data volume among data centers. [Methods] Based on the data resources volume collected by scientific data centers and the related science data centers heterogeneity factors data with the help of public service platforms such as “China Science and Technology Resource Sharing Net” and “Science Data Center of CAS”, this article realizes the quantification of heterogeneous factors and analysis of their impact on the data volume by using dummy variables and correlation analysis. The data volume panel model coupled with the heterogeneity of scientific data centers was constructed by using hypothesis testing, the least square virtual variable method, and other statistical methods. [Results] The proposed model eliminates the difference of data volume index caused by the heterogeneity of scientific data centers and realizes the horizontal differential comparative study of data volume among various types of scientific data centers. [Conclusions] The heterogeneous adjustment method for the data volume of scientific data centers is proposed for the first time in this study, which has important references for systematic and scientific evaluation of scientific data centers.

Key words: scientific data centers, heterogeneity, data volume, panel data model, least squares virtual variables