Frontiers of Data and Computing ›› 2021, Vol. 3 ›› Issue (6): 60-80.
doi: 10.11871/jfdc.10-1649.2021.06.005
Previous Articles Next Articles
BAI Rujiang(),ZHAO Mengmeng(),ZHANG Yujie(),DONG Kun()
Received:
2021-11-16
Online:
2021-12-20
Published:
2022-01-26
Contact:
BAI Rujiang
E-mail:brj@sdut.edu.cn;zhaomeng199701@163.com;zyj1725@163.com;dongkun@sdut.edu.cn
BAI Rujiang,ZHAO Mengmeng,ZHANG Yujie,DONG Kun. Review on Scientific Literature Mining: Tools and Technologies[J]. Frontiers of Data and Computing, 2021, 3(6): 60-80.
Table 1
Network mining tools for S&T literature"
工具名称 | 开发者 | 支持的科技文献数据源 | 支持的挖掘分析维度 | 可视化分析功能 |
---|---|---|---|---|
CiteSpace | 陈超美 | WoS数据库、PubMed数据库、Derwent 数据库、CNKI数据库、CSSCI数据库、Scopus数据库、Google Scholar 数据库、NSF Awards数据库等 | 合作:作者、机构、国家;共现:特征词、关键词、学科;共引:文献、作者、期刊;耦合:文献、基金 | 聚类视图、时间线视图(鱼眼视图)、时区视图 |
VOSviewer | Van Eck、Waltman | WoS数据库、Scopus数据库、Dimen-sions数据库、PubMed数据库等 | 文献、作者、国家、学科重要术语 | 标签视图、密度视图、聚类密度视图、散点视图 |
SCI2 | Katy Börner及其团队 | WoS数据库等 | 文献、作者、 | 主题地图分析、有向网络分析 |
Table 2
S&T literature data and mining analysis platform"
数据服务平台名称 | 数据类型 | 文献服务功能 | 文献挖掘服务 |
---|---|---|---|
Dimensions | 文献、数据集、基金、专利、临床试验、政策文档等 | 1.提供各种研究数据之间的链接 2.支持多种API 3.提供不限制用户的综合数据库,并提供多种筛选选项 4.提供项目资助数据 | 1.提供语义检索和本体论检索 2.基于元数据将出版物和引文与基金、专利、临床实验、数据集和政策论文链接 3.提供文献引文网络分析 |
LENS | 专利记录、学术数据、生物序列和文档链接 | 1.提供强大的专利、学术数据检索与分析功能 2.提供用户收藏的动态跟踪功能 3.提供学术机构与企业的影响力映射 | 1.聚合元数据,提供链接和映射元数据之间的关系 2. 提供多样化专利检索与分析服务 3.实体歧义消岐 |
Europe PMC | 生命科学领域文献摘要和全文 | 1.提供文献与外部研究资源的链接 2.提供生命科学研究相关丰富的文献数据,包括预印 3.提供用户作品展示 4提供注释检索功能 5.提供文献背后研究数据 6.支持通过文章API访问出版物和相关信息 | 1.支持注释检索 2.支持出版物与研究数据和外部相关数据的链接,智能引文、同行评议材料等外部相关数据相链接 3.提供对开放内容和元数据的API访问,构建文献的多样化分析应用 |
ArrowSmith知识发现系统 | 医学领域科技文献 | 根据文献数据集提供有价值的科学假设 | 支持对不相关或相关性较弱的文献集进行文献知识挖掘 |
AMiner | 文献、数据集、基金、专利、临床试验、政策文档等 | 1.提供科技文献、作者等学术信息检索功能 2.提供专利、科技文献等语义检索、语义分析服务 3.提供研究学者档案管理和挖掘服务 | 1.支持科研人员、科技文献、学术活动数据关联检索 2.支持学科领域发展趋势分析 |
脑科学知识引擎(Linked Brain Data) | 神经元数据 | 1.提供神经元数据和知识提取、表示、可视化服务等 2.为相关大脑数据推理提供服务 | 支持神经元数据和知识的语义检索 |
干细胞知识发现 平台 | 干细胞核心专利、基金项目、科学实验、热点论文等数据 | 1.提供科技大数据集成服务 2.提供知识计算服务 3.提供基于大数据知识计算的知识发现服务 | 1.支持科技文献知识内涵挖掘、知识语义关联等 2.支持领域热点前沿探测 3.支持科研画像服务 |
Table 3
Key technologies of S&T literature mining"
挖掘对象 | 挖掘方法 | |
---|---|---|
文献计量分析 | 影响因子分析、被引频次分析、h指数、布拉德福定律、洛特卡定律、引文分析、非相关文献知识发现、社会网络分析方法 | |
科技文献文本内容挖掘 | 简单规则 | 词频分析、One-Hot向量编码、词袋模型(BOW)、N-Grams、共词网络分析、作者合作共现分析、机构合作共现分析 |
统计机器学习 | 关键词抽取:TF-IDF、TextRank、CRF、HMM、SVM | |
语言模型:Word2Vec、Item2Vec、DeepWalk、Node2Vec | ||
主题模型:LDA、LSA、STM、LDA2Vec | ||
深度学习 | 经典模型:CNN、RNN、LSTM、Attention、Transformer | |
预训练模型:BERT、SciBERT、RoBERTa、SpanBERT、ERNIE、 GPT-3 | ||
图神经网络:GCN、GAT、VGAE、GraphSAGE | ||
智能推理挖掘 | 知识图谱 | Google Knowledge Graph、Microsoft Academic Graph、OpenAca-demic Grap、AceKG、SciKGraph |
因果智能 | 因果涌现 | |
因果推断:随机对照试验、准实验设计、倾向得分匹配法、断点回归、结构因果模型、因果机器学习、反事实推理 |
[1] | Bush V. As we may think[J]. The atlantic monthly, 1945,176(1):101-108. |
[2] | Sanderson M, Croft W B. The history of information retrieval research[J]. Proceedings of the IEEE, 2012,100:1444-1451. |
[3] | Luhn H P. Key word in context index for technical litera-ture kwic index[J]. American documentation, 1960,11(4):288-295. |
[4] | Baker D B, Horiszny J W, Metanomski W V. History of abstracting at chemical abstracts service[J]. Journal of Chemical Information and Computer Sciences, 1980,20(4):193-201. |
[5] | Garfield E. “Science Citation Index” a new dimension in indexing[J]. Science, 1964,144(3619):649-654. |
[6] | Price D J S. Networks of scientific papers: the pattern of bibliographic references indicates the nature of the scientific research front[J]. Science, 1965,149(3683):510-515. |
[7] | 郝丽云, 郭启煜. 非相关文献知识发现研究进展[J]. 情报学报, 2006,25(3):342-348. |
[8] | Kostoff R N. Science and technology text mining: Ori-gins of database tomography and multi-word phrase clus-tering[R]. OFFICE OF NAVAL RESEARCH ARLING-TON VA, 2003. |
[9] | Blei D M, Ng A Y, Jordan M I. Latent dirichlet alloca-tion[J]. the Journal of machine Learning research, 2003,3:993-1022. |
[10] | Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]// International Con-ference on Machine Learning. PMLR, 2019: 6105-6114 |
[11] | Porter A L, Cunningham S W. Tech mining: exploiting new technologies for competitive advantage[M]. John Wiley & Sons, 2004: 59-62. |
[12] | 陈悦, 陈超美, 刘则渊, 胡志刚, 王贤文. CiteSpace知识图谱的方法论功能[J]. 科学学研究, 2015,33(02) : 242-253. |
[13] | Eck N J V, Waltman L. Software survey: VOSviewer, acomputer program for bibliometric mapping[J]. Scien-tometrics, 2010,84(2):523-538. |
[14] | 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J] . 农业图书情报学报, 2021,33(01):17-31. |
[15] | Tang J, Zhang J, Yao L, et al. Arnetminer: extraction and mining of academic social networks[C]// Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008: 990-998. |
[16] | 《2016研究前沿》报告在北京发布.[EB/OL]. [ 2016- 11- 01]. http://www.gov.cn/xinwen/2016-11/01/content_5126805.htm. |
[17] | 《2021技术聚焦》发布暨研讨会举行.[EB/OL]. [ 2021- 06- 09]. http://www.bsc.cas.cn/sjdt/202106/t20210610_4792825.html. |
[18] | 2018信息与电子工程开发前沿[J]. 科技中国, 2019 ( 02):9-20. |
[19] | 全球工程前沿项目组.全球工程研究前沿重点解读[J]. 科技中国, 2020(02):4-14. |
[20] | Lazer D, Pentland A, Adamic L, et al. Social science.Computational social science[J]. Science (New York, NY), 2009,323(5915):721-723. |
[21] | Fortunato S, Bergstrom C T, Boerner K, et al. Science of science[J]. Science, 2018,359(6379):1007-1007. |
[22] | Bornmann L, Marx W. HistCite analysis of papers cons-tituting the h index research front[J]. Journal of Infor-metrics, 2012,6(2):285-288. |
[23] | Eck N, Waltman L. CitNetExplorer: A new software tool for analyzing and visualizing citation networks[J]. Journal of Informetrics, 2014,8(4):802-823. |
[24] | Eck N, Waltman L. Software survey: VOSviewer, a com-puter program for bibliometric mapping[J]. Scientometrics, 2010,84(2):523-538. |
[25] | Berthold M R, Cebron N, Dill F, et al. KNIME: The Kons-tanz Information Miner[J]. Acm Sigkdd Exp-lorations Newsletter, 2006,11(1):26-31. |
[26] | Amer M, Goldstein M. Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner[C]// Rapidminer Community Meeting & Conferernce, 2012: 1-12. |
[27] | Demšar J, Curk T, Erjavec A, et al. Orange: data mining toolbox in Python[J]. the Journal of machine Learning research, 2013,14(1):2349-2353. |
[28] | Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Net-works[C]// Proceedings of the Third International Confe-rence on Weblogs and Social Media, ICWSM 2009,San Jose, California, USA, May 17-20, 2009: 33-36. |
[29] | Batagelj V, Mrvar A. Pajek — Analysis and Visualization of Large Networks[C]// Junger M., Mutzel P. Graph Dra-wing Software Mathematics and Visualization. Berlin Heidelberg: Springer, 2004: 77-103. |
[30] | Team D, Adams J, Jones P, et al. Dimensions - A Colla-borative Approach to Enhancing Research Discovery[R]. Digital Science: Dimensions, 2018. |
[31] | About The Lens[EB/OL]. [2021-10-19].https://about.lens.org/the-lens-metarecord/. |
[32] | Christine F, Dayane A, Lynne F, et al. Europe PMC in 2020[J]. Nucleic Acids Research, 2021,49(D1):D1507-D1514. |
[33] | Garfield E. Citation analysis as a tool in journal eval-uation[J]. Science, 1972,178(4060):471-479. |
[34] | Bornmann L, Daniel H D. What do we know about the h index?[J]. Journal of the American Society for Information Science and technology, 2007,58(9):1381-1385. |
[35] | 邱均平, 段宇锋, 陈敬全, 宋恩梅, 嵇丽. 我国文献计量学发展的回顾与展望[J]. 科学学研究, 2003(02):143-148. |
[36] | Pritchard A. Statistical bibliography or bibliometrics[J]. Journal of documentation, 1969,25(4):348-349. |
[37] | Hulme E W. The History of the patent system under the prerogative and at common law[J]. Law Quarterly Review, 1896,12:141. |
[38] | Pao M L. Lotka’s law: a testing procedure[J]. Information processing & management, 1985,21(4):305-320. |
[39] | Newman M E J. Power laws, Pareto distributions and Zipf’s law[J]. Contemporary physics, 2005,46(5):323-351. |
[40] | Small H. Co-citation in the scientific literature: A new measure of the relationship between two documents[J]. Journal of the American Society for information Science, 1973,24(4):265-269. |
[41] | Costas R, Zahedi Z, Wouters P. Do “altmetrics” corre-late with citations? Extensive comparison of altmetric ind-icators with citations from a multidisciplinary perspective[J]. Journal of the Association for Information Science and Technology, 2015,66(10):2003-2019. |
[42] | Mei Q G, Wong E K, Memon N D. Data hiding in binary text documents[C]// Security and watermarking of multimedia contents III. International Society for Optics and Photonics, 2001,4314:369-375. |
[43] | Nadkarni P M, Ohno-Machado L, Chapman W W. Natural language processing: an introduction[J]. Journal of the American Medical Informatics Association, 2011,18(5):544-551. |
[44] | Harris Z S. Distributional structure[J]. Word, 1954,10(2-3):146-162. |
[45] | Shannon C E. A mathematical theory of communcation[J]. ACM SIGMOBILE mobile computing and commu-nications review, 2001,5(1):3-55. |
[46] | Mustafa S H, Al-Radaideh Q A. Using N-grams for Ara-bic text searching[J]. Journal of the American Society for Information Science and Technology, 2004,55(11):1002-1007. |
[47] | Chen C. Visualising semantic spaces and author co-citation networks in digital libraries[J]. Information processing & management, 1999,35(3):401-420. |
[48] | Zhang Q R, Li Y, Liu J S, et al. A dynamic co-word net-work-related approach on the evolution of China’s urba-nization research[J]. Scientometrics, 2017,111(3):1623-1642. |
[49] | 陈云伟. 社会网络分析方法在情报分析中的应用研究[J]. 情报学报, 2019,38(01):21-28. |
[50] | Su H N, Lee P C. Mapping knowledge structure by key-word co-occurrence: a first look at journal papers in Tech-nology Foresight[J]. Scientometrics, 2010,85(1):65-79. |
[51] | 陈悦, 宋超, 周京生, 等. 文献计量学视角下的论文被引频次影响因素研究——兼评使用与被引之间关系[J]. 情报杂志, 2019,38(04):100-108. |
[52] | 王贤文, 刘趁, 毛文莉. 基于专利共被引方法的技术聚类分析——以苹果公司专利为例[J]. 科学与管理, 2014,34(05):31-37. |
[53] | Liu X, Zhang J, Guo C. Full-text citation analysis: A new method to enhance scholarly networks[J]. Journal of the American Society for Information Science and Technology, 2013,64(9):1852-1863. |
[54] | 卢超, 章成志, 王玉琢, Ding Ying. 语义特征分析的深化——学术文献的全文计量分析研究综述[J]. 中国图书馆学报, 2021,47(02):110-131. |
[55] | 胡志刚, 章成志. 悄然兴起的全文计量分析[J]. 图书馆论坛, 2021,41(03):1-11. |
[56] | 白如江, 杨京, 王效岳. 单篇学术论文评价研究现状与发展趋势[J]. 情报理论与实践, 2015,38(11):11-17. |
[57] | Wu L, Wang D, Evans J A. Large teams develop and small teams disrupt science and technology[J]. Nature, 2019,566(7744):378-382. |
[58] | Wang D, Song C, Barabási A L. Quantifying long-term scientific impact[J]. Science, 2013,342(6154):127-132. |
[59] | 邱均平, 董克. 引文网络中文献深度聚合方法与实证研究——以WOS数据库中XML研究论文为例[J]. 中国图书馆学报, 2013,39(02):111-120. |
[60] | Wanjantuk P, Keane J A. Finding related documents via communities in the citation graph[C]// IEEE International Symposium on Communications and Information Tech-nology, 2004. ISCIT 2004. IEEE, 2004,1:445-450. |
[61] | Moliner L A, Gallardo-Gallardo E, de Puelles P G. Under-standing scientific communities: a social network approa-ch to collaborations in Talent Management research[J]. Scientometrics, 2017,113(3):1439-1462. |
[62] | Zheng J, Gong J, Li R, et al. Community evolution analysis based on co-author network: a case study of academic communities of the journal of “Annals of the Association of American Geographers”[J]. Scientometrics, 2017,113(2):845-865. |
[63] | Aizawa A. An information-theoretic perspective of tf-idf measures[J]. Information Processing & Management, 2003,39(1):45-65. |
[64] | Mihalcea R, Tarau P. Textrank: Bringing order into text[C]// Proceedings of the 2004 conference on empirical methods in natural language processing, 2004: 404-411. |
[65] | Gambhir M, Gupta V. Recent automatic text summ-arization techniques: a survey[J]. Artificial Intelligence Review, 2017,47(1):1-66. |
[66] | Zhang C. Automatic keyword extraction from docu-ments using conditional random fields[J]. Journal of Computational Information Systems, 2008,4(3):1169-1180. |
[67] | 方龙, 李信, 黄永, 陆伟. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017,36(06):599-605. |
[68] | Chen P H, Lin C J, Schölkopf B. A tutorial on ν-su-pport vector machines[J]. Applied Stochastic Models in Business and Industry, 2005,21(2):111-136. |
[69] | Tshitoyan V, Dagdelen J, Weston L, et al. Unsupervised word embeddings capture latent knowledge from mat-erials science literature[J]. Nature, 2019,571(7763):95-98. |
[70] | Caliskan A, Bryson J J, Narayanan A. Semantics derived automatically from language corpora contain human-like biases[J]. Science, 2017,356(6334):183-186. |
[71] | Garg N, Schiebinger L, Jurafsky D, et al. Word em-beddings quantify 100 years of gender and ethnic stereot-ypes[J]. Proceedings of the National Academy of Scie-nces, 2018,115(16):E3635-E3644. |
[72] | Barkan O, Koenigstein N. Item2vec: neural item embe-dding for collaborative filtering[J]. arXiv preprint ar-Xiv: 1603. 04259. |
[73] | Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learn-ing of social representations[C]// Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014: 701-710. |
[74] | Lawler G F, Limic V. Random walk: a modern intro-duction[M]. Cambridge University Press, 2010: 225-241. |
[75] | Grover A, Leskovec J. node2vec: Scalable feature lea-rning for networks[C]// Proceedings of the 22nd ACM SIGKDD international conference on Knowledge disco-very and data mining, 2016: 855-864. |
[76] | Peng H, Ke Q, Budak C, et al. Neural embeddings of scholarly periodicals reveal complex disciplinary orga-nizations[J]. Science Advances, 2021, 7(17): eabb-9004. |
[77] | Shen Z, Chen F, Yang L, et al. Node2vec representation for clustering journals and as a possible measure of diversity[J]. Journal of Data and Information Science, 2019,4(2):79. |
[78] | Boyack K W, Klavans R, Börner K. Mapping the back-bone of science[J]. Scientometrics, 2005,64(3):351-374. |
[79] | 范馨月, 崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018,2(03):79-86. |
[80] | 钱庆, 李军莲. 中国生物医学文献数据库的知识管理[J]. 医学情报工作, 2004(05):347-349. |
[81] | 钱庆, 洪娜, 李姣. 面向药物研发的大规模数据语义整合与挖掘模式探索[J]. 数字图书馆论坛, 2014(03):19-25. |
[82] | Dumais S T. Latent semantic analysis[J]. Annual review of information science and technology, 2004,38(1):188-230. |
[83] | Roberts M E, Stewart B M, Tingley D, et al. Structural topic models for open-ended survey responses[J]. Ame-rican Journal of Political Science, 2014,58(4):1064-1082. |
[84] | Moody C E. Mixing dirichlet topic models and word embeddings to make lda2vec[J]. arXiv preprint arXiv: 1605. 02019, 2016. |
[85] | Chen H, Yang C, Zhang X, et al. From Symbols to Embed-dings: A Tale of Two Representations in Comput-ational Social Science[J]. Journal of Social Computing, 2021,2(2):103-156. |
[86] | Small H, Greenlee E. Citation context analysis of a co-citation cluster: Recombinant-DNA[J]. Scientometrics, 1980,2(4):277-301. |
[87] | W. Schneider J. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols[J]. Scientometrics, 2006,68(3):573-593. |
[88] | Topic detection and tracking: event-based information organization[M]. Springer Science & Business Media, 2012: 42-45. |
[89] | Sun Y, Zhai Y. Mapping the knowledge domain and the theme evolution of appropriability research between 1986 and 2016: a scientometric review[J]. Scientometrics, 2018,116(1):203-230. |
[90] | 郭颖, 朱东华, 汪雪峰, 张嶷, 陈建领. 科学技术可视化[J]. 科学学与科学技术管理, 2011,32(12):36-44. |
[91] | Sun X, Ding K. Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents[J]. Scientometrics, 2018,116(3):1735-1748. |
[92] | Kuhn T, Perc M, Helbing D. Inheritance patterns in citation networks reveal scientific memes[J]. Physical Review X, 2014,4(4):041036. |
[93] | Zhou H, Yu H, Hu R. Topic discovery and evolution in scientific literature based on content and citations[J]. Frontiers of Information Technology & Electronic Engineering, 2017,18(10):1511-1524. |
[94] | 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(06):62-70. |
[95] | 胡正银, 方曙, 张娴, 文奕, 梁田. 个性化语义TRIZ构建研究[J]. 图书情报工作, 2015,59(07):123-131. |
[96] | 隗玲, 许海云, 胡正银, 董坤, 王超, 庞弘燊. 学科主题演化路径的多模式识别与预测——一个情报学学科主题演化案例[J]. 图书情报工作, 2016,60(13):71-81. |
[97] | Wang X, Qiu P, Zhu D, et al. Identification of technology development trends based on subject-action-object analysis: The case of dye-sensitized solar cells[J]. Tech-nological forecasting and social change, 2015,98:24-46. |
[98] | Guo J, Wang X, Li Q, et al. Subject-action-object-based morphology analysis for determining the direction of technological change[J]. Technological Forecasting and Social Change, 2016,105:27-40. |
[99] | 刘玉琴, 汪雪锋, 雷孝平. 基于文本挖掘技术的专利质量评价与实证研究[J]. 计算机工程与应用, 2007(33):12-14. |
[100] | 李欣, 王静静, 杨梓, 黄鲁成. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016,35(03):80-84. |
[101] | 张运良, 徐硕, 朱礼军, 乔晓东. 汉语科技词系统——一种可用于科技信息资源深度内容分析的语义资源[J]. 图书情报工作, 2011,55(04):100-105. |
[102] | Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997,9(8):1735-1780. |
[103] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Advances in neural information processing systems, 2017: 5998-6008. |
[104] | Zhang L, Huang Y, Yang J, et al. Aggregating large-scale databases for PubMed author name disambiguation[J]. Journal of the American Medical Informatics Association, 2021,28(9):1919-1927. |
[105] | Zhang C. Automatic keyword extraction from docu-ments using conditional random fields[J]. Journal of Computational Information Systems, 2008,4(3):1169-1180. |
[106] | Zhou Q, Zhang C, Zhao S X, et al. Measuring book impact based on the multi-granularity online review mining[J]. Scientometrics, 2016,107(3):1435-1455. |
[107] | Weston L, Tshitoyan V, Dagdelen J, et al. Named ent-ity recognition and normalization applied to large-scale information extraction from the materials science literature[J]. Journal of chemical information and modeling, 2019,59(9):3692-3702. |
[108] | 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(09):979-985. |
[109] | 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016,35(03):293-300. |
[110] | 黄永, 陆伟, 程齐凯, 桂思思. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016,35(05):530-538. |
[111] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Advances in neural information processing systems, 2017: 5998-6008. |
[112] | Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language under-standing[J]. arXiv preprint arXiv: 1810. 04805, 2018. |
[113] | 于丰畅, 程齐凯, 陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021,5(01):140-149. |
[114] | Lu W, Liu Z, Huang Y, et al. How do authors select key-words? A preliminary study of author keyword selection behavior[J]. Journal of Informetrics, 2020,14(4):101066. |
[115] | 王瑞雪, 方婧, 桂思思, 陆伟, 张显. 基于深度学习算法的学术查询意图分类器构建[J]. 图书情报工作, 2021,65(03):93-99. |
[116] | Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language unders-tanding[J]. arXiv preprint arXiv: 1810. 04805, 2018. |
[117] | Xia P, Wu S, Van Durme B. Which* BERT? A survey organizing contextualized encoders[J]. arXiv preprint arXiv: 2010. 00854, 2020. |
[118] | Sun Y, Wang S, Li Y, et al. Ernie: Enhanced repre-sentation through knowledge integration[J]. arXiv pre-print arXiv: 1904. 09223, 2019. |
[119] | Sun Y, Wang S, Li Y, et al. Ernie 2.0: A continual pre-training framework for language understanding[J]. arXiv preprint arXiv: 1907. 12412 |
[120] | Floridi L, Chiriatti M. GPT-3: Its nature, scope, limits, and consequences[J]. Minds and Machines, 2020,30(4):681-694. |
[121] | Lin J, Yu Y, Zhou Y, et al. How many preprints have ac-tually been printed and why: a case study of computer science preprints on arXiv[J]. Scientometrics, 2020,124(1):555-574. |
[122] | McKiernan, Gerry. arXiv.org: The Los Alamos National Laboratory e-print server[J]. International Journal on Grey Literature, 2000,1(3):127-138. |
[123] | Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks[J]. IEEE transactions on neural networks and learning systems, 2020,32(1):4-24. |
[124] | Kozlowski D, Dusdal J, Pang J, et al. Semantic and rela-tional spaces in science of science: deep learning models for article vectorisation[J]. Scientometrics, 2021,126(7):5881. |
[125] | Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv: 1609. 02907, 2016. |
[126] | Jeong C, Jang S, Park E, et al. A context-aware citation recommendation model with BERT and graph convolu-tional networks[J]. Scientometrics, 2020,124(3):1907-1922. |
[127] | Zou X. A survey on application of knowledge graph[C]//Journal of Physics: Conference Series. IOP Publishing, 2020,1487(1):012016. |
[128] | Wang K, Shen Z, Huang C, et al. Microsoft academic graph: When experts are not enough[J]. Quantitative Science Studies, 2020,1(1):396-413. |
[129] | Open Academic Graph[EB/OL]. https://www.micro-soft.com/en-us/research/project/open-academic-graph/. |
[130] | Wang R, Yan Y, Wang J, et al. Acekg: A large-scale knowledge graph for academic data mining[J]. arXiv preprint arXiv: 1807. 08484. |
[131] | Tosi M D L, dos Reis J C. SciKGraph: A knowledge gr-aph approach to structure a scientific field[J]. Journal of Informetrics, 2021,15(1):101109. |
[132] | Pearl J. Causal inference in statistics: An overview[J]. Statistics surveys, 2009,3:96-146. |
[133] | Pearl J, Mackenzie D. The book of why: the new science of cause and effect[M]. Basic books, 2018: 349-360. |
[134] | Wang L, Richardson T S, Zhou X H. Causal analysis of ordinal treatments and binary outcomes under truncation by death[J]. Journal of the Royal Statistical Society. Series B, Statistical methodology, 2017,79(3):719. |
[135] | Varian H R. Causal inference in economics and marke-ting[J]. Proceedings of the National Academy of Sciences, 2016,113(27):7310-7315. |
[136] | Schölkopf B. Causality for machine learning[J]. arXiv preprint arXiv: 1911. 10500, 2019. |
[137] | Hoel E P, Albantakis L, Tononi G. Quantifying causal emergence shows that macro can beat micro[J]. Procee-dings of the National Academy of Sciences, 2013,110(49):19790-19795. |
[138] | Schölkopf B, Locatello F, Bauer S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021,109(5):612-634. |
[139] | Zhao Z, Bu Y, Kang L, et al. An investigation of the relationship between scientists’ mobility to/from China and their research performance[J]. Journal of Informe-trics, 2020,14(2):101037. |
[140] | Wang Y, Jones B F, Wang D. Early-career setback and future career impact[J]. Nature communications, 2019,10(1):1-10. |
[141] | Huo D, Dang J, Motohashi K. Empirical Analysis of License Policy for Declared Standard-essential Patents in Setting Technology Standards[M]. RIETI, 2019: 14-16. |
[142] | 张艳丰, 彭丽徽, 刘金承, 洪闯. 新媒体环境下移动社交媒体倦怠用户画像实证研究——基于SSO理论的因果关系视角[J]. 情报学报, 2019,38(10):1092-1101. |
No related articles found! |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||