数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (2): 93-102.

doi: 10.11871/jfdc.issn.2096-742X.2021.02.011

• 管理决策与智能应用专刊 • 上一篇    下一篇

开源代码对论文引用的影响机理与实证分析:以计算机领域为例

汪舒雯1,2(),许元杰1,2(),陈远平3(),李建平4(),吴登生1,2,*()   

  1. 1.中国科学院科技战略咨询研究院,北京 100190
    2.中国科学院大学公共政策与管理学院,北京 100049
    3.中国科学院计算机网络信息中心,北京 100190
    4.中国科学院大学经济与管理学院,北京 100190
  • 收稿日期:2021-04-01 出版日期:2021-04-20 发布日期:2021-05-18
  • 通讯作者: 吴登生
  • 作者简介:汪舒雯,中国科学院大学,中国科学院科技战略咨询研究院,在读硕士研究生,主要研究兴趣为科技文本数据挖掘和文献计量。
    在本文中负责文章撰写和模型分析。
    WANG Shuwen is a master student of University of Chinese Academy of Sciences and Institutes of Science and Development, Chinese Academy of Sciences. Her research fields are scientific text data mining and bibliometrics.
    In this paper, she is responsible for model analysis and paper writing.
    E-mail: wangshuwen20@mails.ucas.ac.cn|许元杰,中国科学院大学,中国科学院科技战略咨询研究院,在读博士研究生,主要研究兴趣为科技文本数据挖掘和知识发现。
    在本文中负责研究数据获取、清洗、预处理的工作和文章相应部分的写作。
    XU Yuanjie is currently a PhD student of University of Chinese Academy of Sciences and Institutes of Science and Development, Chinese Academy of Sciences. Her current research interests include scientific text data mining and knowledge discovery.
    In this paper, she is responsible for conducting and writing the parts of research data sampling, cleaning, and preprocessing.
    E-mail: xuyuanjie18@mails.ucas.ac.cn|陈远平,中国科学院计算机网络信息中心,高级工程师,主要研究方向为数据分析、决策分析模型研究、数据挖掘应用。
    在本文中的主要负责数据分析工作。
    CHEN Yuanping is a senior engineer from the Computer Network Information Center of the Chinese Academy of Sciences. His main research interests are data analysis, decision analysis model research, and data mining applications.
    In this paper, he is mainly responsible for data analysis.
    E-mail: ypchen@cnic.cn|李建平,中国科学院大学经济与管理学院,教授,主要研究方向为风险管理、大数据管理决策。
    在本文中承担论文引用概念框架研究工作。
    LI Jianping is a professor at the School of Economics and Management, University of Chinese Academy of Sciences. His main research interests are risk management and big data in management decision making.
    In this paper, he is responsible for the conceptual framework of paper citation.
    E-mail: ljp@ucas.ac.cn|吴登生,中国科学院科技战略咨询研究院,副研究员,主要研究方向为数据驱动的科技管理与决策和风险管理方面研究,主持NSFC优秀青年科学基金项目等课题10余项,在Risk Analysis、EJOR、《中国管理科学》等领域知名期刊上发表学术论文60余篇。
    在本文中承担总体统稿和引用模型的研究工作。
    WU Dengsheng is currently the associate professor at the Institutes of Science and Development, Chinese Academy of Sciences. His research interests include scientific data analysis, decision making, and risk analysis. He has been the Principal Investigator for more than 10 grants sponsored by the National Natural Science Foundation of China. He has published more than 60 papers in the leading journals, such as Risk Analysis, European Journal of Operational Research, and Chinese Journal of Management Science.
    In this paper, he is responsible for the review of the overall manuscript and the research of the citation model.
    E-mail: wds@casipm.ac.cn
  • 基金资助:
    国家自然科学基金项目(72022021);国家自然科学基金项目(71874180);中国科学院前沿科学重点研究项目(QYZDB-SSW-SYS036)

Influence Mechanism of Code-Sharing on Paper Citations:An Empirical Analysis on Computer Science Field

WANG Shuwen1,2(),XU Yuanjie1,2(),CHEN Yuanping3(),LI Jianping4(),WU Dengsheng1,2,*()   

  1. 1. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China
    3. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    4. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-04-01 Online:2021-04-20 Published:2021-05-18
  • Contact: WU Dengsheng

摘要:

[目的] 开源代码是计算机领域内研究成果可验证和可复现的重要依据,本文旨在探究计算机领域论文是否开源代码以及不同开源代码类型对论文被引量的影响。[方法] 以Papers with Code上2043篇计算机领域期刊论文为样本,采用基于稳健标准误差的多元回归模型进行分析。[结果] 研究表明,论文开源代码与被引量呈显著正相关,不同开源代码类型的论文被引优势有别。[结论] 计算机领域论文开源代码不仅提供了研究成果复现的手段,还有助于增加论文被引次数,且在Github代码仓库的README文件中提及原文信息利于促进论文被引。

关键词: 引用优势, 代码开源, 多元回归, 计算机科学

Abstract:

[Objective] Open source code is an important basis for verifiable and reproducible research results in the field of computer science. This article aims to explore whether the codes in a papers is open source or not has an impact on the citation number of the paper, and also the impact of different types of code-sharing on paper citations. [Methods] Using papers from Papers with Code as the research objects, this paper analyses 2043 papers of the computer field by applying the least square method based on robust standard error to carry out the regression analysis. [Results] The results show that the citation frequency of a code-sharing paper is expected to be significantly higher than those of closed source papers, and different code-sharing types have different effects on citation amount. [Conclusions] For papers in the field of computer, code-sharing not only provides a mechanism of reproducing research results, but also helps to increase the number of citations of the papers. Besides, the README file of GitHub code repository that refers to the original information facilitates the citation of the paper.

Key words: citation advantage, code-sharing, multiple regression, computer science