数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (6): 60-80.
doi: 10.11871/jfdc.10-1649.2021.06.005
收稿日期:
2021-11-16
出版日期:
2021-12-20
发布日期:
2022-01-26
通讯作者:
白如江*
作者简介:
白如江,山东理工大学信息管理研究院,教授,博士,情报学硕士研究生导师。山东省高等学校青创人才引育计划“科技大数据研究创新团队”负责人,入选山东理工大学高层次人才“双百工程”第三层次。主持国家社科基金项目3项、中国博士后科学基金特别资助、中国博士后科学基金一等资助等项目,合作出版专著2部、发表论文70余篇,申请计算机软件著作权1项。主要研究领域为科技大数据挖掘、科技情报分析、智慧情报感知等。基金资助:
BAI Rujiang(),ZHAO Mengmeng(),ZHANG Yujie(),DONG Kun()
Received:
2021-11-16
Online:
2021-12-20
Published:
2022-01-26
Contact:
BAI Rujiang
摘要:
【目的】对科技文献挖掘的主要工具、系统平台和关键技术进行全面系统梳理,指出未来发展趋势,为相关研究提供参考。【方法】通过网络和文献调研等方法梳理科技文献挖掘的历史发展脉络,总结科技文献挖掘的主要工具、系统平台及其特点,从平台功能、数据类型、可视化功能等维度进行了对比分析,重点介绍科技文献挖掘的关键技术及其发展前沿。【结果】论文详细阐述了科技文献挖掘“从哪里挖、用什么工具挖、怎么挖”的问题,并指出科技文献挖掘的数据源逐步向多源数据融合和细粒度知识组织方向发展,科技文献语义知识图谱构建是目前研究的热点话题,图神经网络、预训练模型、对抗学习网络等深度学习模型是当前科技文献挖掘的前沿关键技术,因果推断方法正在逐步成为前沿方向。【结论】随着大数据、人工智能的持续深入发展,科技文献挖掘将借助数据和技术红利在科技情报决策等具体应用场景发挥更大价值。
白如江*,赵梦梦,张玉洁,董坤. 科技文献挖掘工具平台与关键技术综述[J]. 数据与计算发展前沿, 2021, 3(6): 60-80.
BAI Rujiang,ZHAO Mengmeng,ZHANG Yujie,DONG Kun. Review on Scientific Literature Mining: Tools and Technologies[J]. Frontiers of Data and Computing, 2021, 3(6): 60-80.
表1
科技文献网络分析挖掘工具"
工具名称 | 开发者 | 支持的科技文献数据源 | 支持的挖掘分析维度 | 可视化分析功能 |
---|---|---|---|---|
CiteSpace | 陈超美 | WoS数据库、PubMed数据库、Derwent 数据库、CNKI数据库、CSSCI数据库、Scopus数据库、Google Scholar 数据库、NSF Awards数据库等 | 合作:作者、机构、国家;共现:特征词、关键词、学科;共引:文献、作者、期刊;耦合:文献、基金 | 聚类视图、时间线视图(鱼眼视图)、时区视图 |
VOSviewer | Van Eck、Waltman | WoS数据库、Scopus数据库、Dimen-sions数据库、PubMed数据库等 | 文献、作者、国家、学科重要术语 | 标签视图、密度视图、聚类密度视图、散点视图 |
SCI2 | Katy Börner及其团队 | WoS数据库等 | 文献、作者、 | 主题地图分析、有向网络分析 |
表2
科技文献数据与挖掘分析平台"
数据服务平台名称 | 数据类型 | 文献服务功能 | 文献挖掘服务 |
---|---|---|---|
Dimensions | 文献、数据集、基金、专利、临床试验、政策文档等 | 1.提供各种研究数据之间的链接 2.支持多种API 3.提供不限制用户的综合数据库,并提供多种筛选选项 4.提供项目资助数据 | 1.提供语义检索和本体论检索 2.基于元数据将出版物和引文与基金、专利、临床实验、数据集和政策论文链接 3.提供文献引文网络分析 |
LENS | 专利记录、学术数据、生物序列和文档链接 | 1.提供强大的专利、学术数据检索与分析功能 2.提供用户收藏的动态跟踪功能 3.提供学术机构与企业的影响力映射 | 1.聚合元数据,提供链接和映射元数据之间的关系 2. 提供多样化专利检索与分析服务 3.实体歧义消岐 |
Europe PMC | 生命科学领域文献摘要和全文 | 1.提供文献与外部研究资源的链接 2.提供生命科学研究相关丰富的文献数据,包括预印 3.提供用户作品展示 4提供注释检索功能 5.提供文献背后研究数据 6.支持通过文章API访问出版物和相关信息 | 1.支持注释检索 2.支持出版物与研究数据和外部相关数据的链接,智能引文、同行评议材料等外部相关数据相链接 3.提供对开放内容和元数据的API访问,构建文献的多样化分析应用 |
ArrowSmith知识发现系统 | 医学领域科技文献 | 根据文献数据集提供有价值的科学假设 | 支持对不相关或相关性较弱的文献集进行文献知识挖掘 |
AMiner | 文献、数据集、基金、专利、临床试验、政策文档等 | 1.提供科技文献、作者等学术信息检索功能 2.提供专利、科技文献等语义检索、语义分析服务 3.提供研究学者档案管理和挖掘服务 | 1.支持科研人员、科技文献、学术活动数据关联检索 2.支持学科领域发展趋势分析 |
脑科学知识引擎(Linked Brain Data) | 神经元数据 | 1.提供神经元数据和知识提取、表示、可视化服务等 2.为相关大脑数据推理提供服务 | 支持神经元数据和知识的语义检索 |
干细胞知识发现 平台 | 干细胞核心专利、基金项目、科学实验、热点论文等数据 | 1.提供科技大数据集成服务 2.提供知识计算服务 3.提供基于大数据知识计算的知识发现服务 | 1.支持科技文献知识内涵挖掘、知识语义关联等 2.支持领域热点前沿探测 3.支持科研画像服务 |
表3
科技文献挖掘关键技术"
挖掘对象 | 挖掘方法 | |
---|---|---|
文献计量分析 | 影响因子分析、被引频次分析、h指数、布拉德福定律、洛特卡定律、引文分析、非相关文献知识发现、社会网络分析方法 | |
科技文献文本内容挖掘 | 简单规则 | 词频分析、One-Hot向量编码、词袋模型(BOW)、N-Grams、共词网络分析、作者合作共现分析、机构合作共现分析 |
统计机器学习 | 关键词抽取:TF-IDF、TextRank、CRF、HMM、SVM | |
语言模型:Word2Vec、Item2Vec、DeepWalk、Node2Vec | ||
主题模型:LDA、LSA、STM、LDA2Vec | ||
深度学习 | 经典模型:CNN、RNN、LSTM、Attention、Transformer | |
预训练模型:BERT、SciBERT、RoBERTa、SpanBERT、ERNIE、 GPT-3 | ||
图神经网络:GCN、GAT、VGAE、GraphSAGE | ||
智能推理挖掘 | 知识图谱 | Google Knowledge Graph、Microsoft Academic Graph、OpenAca-demic Grap、AceKG、SciKGraph |
因果智能 | 因果涌现 | |
因果推断:随机对照试验、准实验设计、倾向得分匹配法、断点回归、结构因果模型、因果机器学习、反事实推理 |
[1] | Bush V. As we may think[J]. The atlantic monthly, 1945,176(1):101-108. |
[2] | Sanderson M, Croft W B. The history of information retrieval research[J]. Proceedings of the IEEE, 2012,100:1444-1451. |
[3] | Luhn H P. Key word in context index for technical litera-ture kwic index[J]. American documentation, 1960,11(4):288-295. |
[4] | Baker D B, Horiszny J W, Metanomski W V. History of abstracting at chemical abstracts service[J]. Journal of Chemical Information and Computer Sciences, 1980,20(4):193-201. |
[5] | Garfield E. “Science Citation Index” a new dimension in indexing[J]. Science, 1964,144(3619):649-654. |
[6] | Price D J S. Networks of scientific papers: the pattern of bibliographic references indicates the nature of the scientific research front[J]. Science, 1965,149(3683):510-515. |
[7] | 郝丽云, 郭启煜. 非相关文献知识发现研究进展[J]. 情报学报, 2006,25(3):342-348. |
[8] | Kostoff R N. Science and technology text mining: Ori-gins of database tomography and multi-word phrase clus-tering[R]. OFFICE OF NAVAL RESEARCH ARLING-TON VA, 2003. |
[9] | Blei D M, Ng A Y, Jordan M I. Latent dirichlet alloca-tion[J]. the Journal of machine Learning research, 2003,3:993-1022. |
[10] | Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]// International Con-ference on Machine Learning. PMLR, 2019: 6105-6114 |
[11] | Porter A L, Cunningham S W. Tech mining: exploiting new technologies for competitive advantage[M]. John Wiley & Sons, 2004: 59-62. |
[12] | 陈悦, 陈超美, 刘则渊, 胡志刚, 王贤文. CiteSpace知识图谱的方法论功能[J]. 科学学研究, 2015,33(02) : 242-253. |
[13] | Eck N J V, Waltman L. Software survey: VOSviewer, acomputer program for bibliometric mapping[J]. Scien-tometrics, 2010,84(2):523-538. |
[14] | 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J] . 农业图书情报学报, 2021,33(01):17-31. |
[15] | Tang J, Zhang J, Yao L, et al. Arnetminer: extraction and mining of academic social networks[C]// Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008: 990-998. |
[16] | 《2016研究前沿》报告在北京发布.[EB/OL]. [ 2016- 11- 01]. http://www.gov.cn/xinwen/2016-11/01/content_5126805.htm. |
[17] | 《2021技术聚焦》发布暨研讨会举行.[EB/OL]. [ 2021- 06- 09]. http://www.bsc.cas.cn/sjdt/202106/t20210610_4792825.html. |
[18] | 2018信息与电子工程开发前沿[J]. 科技中国, 2019 ( 02):9-20. |
[19] | 全球工程前沿项目组.全球工程研究前沿重点解读[J]. 科技中国, 2020(02):4-14. |
[20] | Lazer D, Pentland A, Adamic L, et al. Social science.Computational social science[J]. Science (New York, NY), 2009,323(5915):721-723. |
[21] | Fortunato S, Bergstrom C T, Boerner K, et al. Science of science[J]. Science, 2018,359(6379):1007-1007. |
[22] | Bornmann L, Marx W. HistCite analysis of papers cons-tituting the h index research front[J]. Journal of Infor-metrics, 2012,6(2):285-288. |
[23] | Eck N, Waltman L. CitNetExplorer: A new software tool for analyzing and visualizing citation networks[J]. Journal of Informetrics, 2014,8(4):802-823. |
[24] | Eck N, Waltman L. Software survey: VOSviewer, a com-puter program for bibliometric mapping[J]. Scientometrics, 2010,84(2):523-538. |
[25] | Berthold M R, Cebron N, Dill F, et al. KNIME: The Kons-tanz Information Miner[J]. Acm Sigkdd Exp-lorations Newsletter, 2006,11(1):26-31. |
[26] | Amer M, Goldstein M. Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner[C]// Rapidminer Community Meeting & Conferernce, 2012: 1-12. |
[27] | Demšar J, Curk T, Erjavec A, et al. Orange: data mining toolbox in Python[J]. the Journal of machine Learning research, 2013,14(1):2349-2353. |
[28] | Bastian M, Heymann S, Jacomy M. Gephi: An Open Source Software for Exploring and Manipulating Net-works[C]// Proceedings of the Third International Confe-rence on Weblogs and Social Media, ICWSM 2009,San Jose, California, USA, May 17-20, 2009: 33-36. |
[29] | Batagelj V, Mrvar A. Pajek — Analysis and Visualization of Large Networks[C]// Junger M., Mutzel P. Graph Dra-wing Software Mathematics and Visualization. Berlin Heidelberg: Springer, 2004: 77-103. |
[30] | Team D, Adams J, Jones P, et al. Dimensions - A Colla-borative Approach to Enhancing Research Discovery[R]. Digital Science: Dimensions, 2018. |
[31] | About The Lens[EB/OL]. [2021-10-19].https://about.lens.org/the-lens-metarecord/. |
[32] | Christine F, Dayane A, Lynne F, et al. Europe PMC in 2020[J]. Nucleic Acids Research, 2021,49(D1):D1507-D1514. |
[33] | Garfield E. Citation analysis as a tool in journal eval-uation[J]. Science, 1972,178(4060):471-479. |
[34] | Bornmann L, Daniel H D. What do we know about the h index?[J]. Journal of the American Society for Information Science and technology, 2007,58(9):1381-1385. |
[35] | 邱均平, 段宇锋, 陈敬全, 宋恩梅, 嵇丽. 我国文献计量学发展的回顾与展望[J]. 科学学研究, 2003(02):143-148. |
[36] | Pritchard A. Statistical bibliography or bibliometrics[J]. Journal of documentation, 1969,25(4):348-349. |
[37] | Hulme E W. The History of the patent system under the prerogative and at common law[J]. Law Quarterly Review, 1896,12:141. |
[38] | Pao M L. Lotka’s law: a testing procedure[J]. Information processing & management, 1985,21(4):305-320. |
[39] | Newman M E J. Power laws, Pareto distributions and Zipf’s law[J]. Contemporary physics, 2005,46(5):323-351. |
[40] | Small H. Co-citation in the scientific literature: A new measure of the relationship between two documents[J]. Journal of the American Society for information Science, 1973,24(4):265-269. |
[41] | Costas R, Zahedi Z, Wouters P. Do “altmetrics” corre-late with citations? Extensive comparison of altmetric ind-icators with citations from a multidisciplinary perspective[J]. Journal of the Association for Information Science and Technology, 2015,66(10):2003-2019. |
[42] | Mei Q G, Wong E K, Memon N D. Data hiding in binary text documents[C]// Security and watermarking of multimedia contents III. International Society for Optics and Photonics, 2001,4314:369-375. |
[43] | Nadkarni P M, Ohno-Machado L, Chapman W W. Natural language processing: an introduction[J]. Journal of the American Medical Informatics Association, 2011,18(5):544-551. |
[44] | Harris Z S. Distributional structure[J]. Word, 1954,10(2-3):146-162. |
[45] | Shannon C E. A mathematical theory of communcation[J]. ACM SIGMOBILE mobile computing and commu-nications review, 2001,5(1):3-55. |
[46] | Mustafa S H, Al-Radaideh Q A. Using N-grams for Ara-bic text searching[J]. Journal of the American Society for Information Science and Technology, 2004,55(11):1002-1007. |
[47] | Chen C. Visualising semantic spaces and author co-citation networks in digital libraries[J]. Information processing & management, 1999,35(3):401-420. |
[48] | Zhang Q R, Li Y, Liu J S, et al. A dynamic co-word net-work-related approach on the evolution of China’s urba-nization research[J]. Scientometrics, 2017,111(3):1623-1642. |
[49] | 陈云伟. 社会网络分析方法在情报分析中的应用研究[J]. 情报学报, 2019,38(01):21-28. |
[50] | Su H N, Lee P C. Mapping knowledge structure by key-word co-occurrence: a first look at journal papers in Tech-nology Foresight[J]. Scientometrics, 2010,85(1):65-79. |
[51] | 陈悦, 宋超, 周京生, 等. 文献计量学视角下的论文被引频次影响因素研究——兼评使用与被引之间关系[J]. 情报杂志, 2019,38(04):100-108. |
[52] | 王贤文, 刘趁, 毛文莉. 基于专利共被引方法的技术聚类分析——以苹果公司专利为例[J]. 科学与管理, 2014,34(05):31-37. |
[53] | Liu X, Zhang J, Guo C. Full-text citation analysis: A new method to enhance scholarly networks[J]. Journal of the American Society for Information Science and Technology, 2013,64(9):1852-1863. |
[54] | 卢超, 章成志, 王玉琢, Ding Ying. 语义特征分析的深化——学术文献的全文计量分析研究综述[J]. 中国图书馆学报, 2021,47(02):110-131. |
[55] | 胡志刚, 章成志. 悄然兴起的全文计量分析[J]. 图书馆论坛, 2021,41(03):1-11. |
[56] | 白如江, 杨京, 王效岳. 单篇学术论文评价研究现状与发展趋势[J]. 情报理论与实践, 2015,38(11):11-17. |
[57] | Wu L, Wang D, Evans J A. Large teams develop and small teams disrupt science and technology[J]. Nature, 2019,566(7744):378-382. |
[58] | Wang D, Song C, Barabási A L. Quantifying long-term scientific impact[J]. Science, 2013,342(6154):127-132. |
[59] | 邱均平, 董克. 引文网络中文献深度聚合方法与实证研究——以WOS数据库中XML研究论文为例[J]. 中国图书馆学报, 2013,39(02):111-120. |
[60] | Wanjantuk P, Keane J A. Finding related documents via communities in the citation graph[C]// IEEE International Symposium on Communications and Information Tech-nology, 2004. ISCIT 2004. IEEE, 2004,1:445-450. |
[61] | Moliner L A, Gallardo-Gallardo E, de Puelles P G. Under-standing scientific communities: a social network approa-ch to collaborations in Talent Management research[J]. Scientometrics, 2017,113(3):1439-1462. |
[62] | Zheng J, Gong J, Li R, et al. Community evolution analysis based on co-author network: a case study of academic communities of the journal of “Annals of the Association of American Geographers”[J]. Scientometrics, 2017,113(2):845-865. |
[63] | Aizawa A. An information-theoretic perspective of tf-idf measures[J]. Information Processing & Management, 2003,39(1):45-65. |
[64] | Mihalcea R, Tarau P. Textrank: Bringing order into text[C]// Proceedings of the 2004 conference on empirical methods in natural language processing, 2004: 404-411. |
[65] | Gambhir M, Gupta V. Recent automatic text summ-arization techniques: a survey[J]. Artificial Intelligence Review, 2017,47(1):1-66. |
[66] | Zhang C. Automatic keyword extraction from docu-ments using conditional random fields[J]. Journal of Computational Information Systems, 2008,4(3):1169-1180. |
[67] | 方龙, 李信, 黄永, 陆伟. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017,36(06):599-605. |
[68] | Chen P H, Lin C J, Schölkopf B. A tutorial on ν-su-pport vector machines[J]. Applied Stochastic Models in Business and Industry, 2005,21(2):111-136. |
[69] | Tshitoyan V, Dagdelen J, Weston L, et al. Unsupervised word embeddings capture latent knowledge from mat-erials science literature[J]. Nature, 2019,571(7763):95-98. |
[70] | Caliskan A, Bryson J J, Narayanan A. Semantics derived automatically from language corpora contain human-like biases[J]. Science, 2017,356(6334):183-186. |
[71] | Garg N, Schiebinger L, Jurafsky D, et al. Word em-beddings quantify 100 years of gender and ethnic stereot-ypes[J]. Proceedings of the National Academy of Scie-nces, 2018,115(16):E3635-E3644. |
[72] | Barkan O, Koenigstein N. Item2vec: neural item embe-dding for collaborative filtering[J]. arXiv preprint ar-Xiv: 1603. 04259. |
[73] | Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learn-ing of social representations[C]// Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014: 701-710. |
[74] | Lawler G F, Limic V. Random walk: a modern intro-duction[M]. Cambridge University Press, 2010: 225-241. |
[75] | Grover A, Leskovec J. node2vec: Scalable feature lea-rning for networks[C]// Proceedings of the 22nd ACM SIGKDD international conference on Knowledge disco-very and data mining, 2016: 855-864. |
[76] | Peng H, Ke Q, Budak C, et al. Neural embeddings of scholarly periodicals reveal complex disciplinary orga-nizations[J]. Science Advances, 2021, 7(17): eabb-9004. |
[77] | Shen Z, Chen F, Yang L, et al. Node2vec representation for clustering journals and as a possible measure of diversity[J]. Journal of Data and Information Science, 2019,4(2):79. |
[78] | Boyack K W, Klavans R, Börner K. Mapping the back-bone of science[J]. Scientometrics, 2005,64(3):351-374. |
[79] | 范馨月, 崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018,2(03):79-86. |
[80] | 钱庆, 李军莲. 中国生物医学文献数据库的知识管理[J]. 医学情报工作, 2004(05):347-349. |
[81] | 钱庆, 洪娜, 李姣. 面向药物研发的大规模数据语义整合与挖掘模式探索[J]. 数字图书馆论坛, 2014(03):19-25. |
[82] | Dumais S T. Latent semantic analysis[J]. Annual review of information science and technology, 2004,38(1):188-230. |
[83] | Roberts M E, Stewart B M, Tingley D, et al. Structural topic models for open-ended survey responses[J]. Ame-rican Journal of Political Science, 2014,58(4):1064-1082. |
[84] | Moody C E. Mixing dirichlet topic models and word embeddings to make lda2vec[J]. arXiv preprint arXiv: 1605. 02019, 2016. |
[85] | Chen H, Yang C, Zhang X, et al. From Symbols to Embed-dings: A Tale of Two Representations in Comput-ational Social Science[J]. Journal of Social Computing, 2021,2(2):103-156. |
[86] | Small H, Greenlee E. Citation context analysis of a co-citation cluster: Recombinant-DNA[J]. Scientometrics, 1980,2(4):277-301. |
[87] | W. Schneider J. Concept symbols revisited: Naming clusters by parsing and filtering of noun phrases from citation contexts of concept symbols[J]. Scientometrics, 2006,68(3):573-593. |
[88] | Topic detection and tracking: event-based information organization[M]. Springer Science & Business Media, 2012: 42-45. |
[89] | Sun Y, Zhai Y. Mapping the knowledge domain and the theme evolution of appropriability research between 1986 and 2016: a scientometric review[J]. Scientometrics, 2018,116(1):203-230. |
[90] | 郭颖, 朱东华, 汪雪峰, 张嶷, 陈建领. 科学技术可视化[J]. 科学学与科学技术管理, 2011,32(12):36-44. |
[91] | Sun X, Ding K. Identifying and tracking scientific and technological knowledge memes from citation networks of publications and patents[J]. Scientometrics, 2018,116(3):1735-1748. |
[92] | Kuhn T, Perc M, Helbing D. Inheritance patterns in citation networks reveal scientific memes[J]. Physical Review X, 2014,4(4):041036. |
[93] | Zhou H, Yu H, Hu R. Topic discovery and evolution in scientific literature based on content and citations[J]. Frontiers of Information Technology & Electronic Engineering, 2017,18(10):1511-1524. |
[94] | 胡正银, 方曙. 专利文本技术挖掘研究进展综述[J]. 现代图书情报技术, 2014(06):62-70. |
[95] | 胡正银, 方曙, 张娴, 文奕, 梁田. 个性化语义TRIZ构建研究[J]. 图书情报工作, 2015,59(07):123-131. |
[96] | 隗玲, 许海云, 胡正银, 董坤, 王超, 庞弘燊. 学科主题演化路径的多模式识别与预测——一个情报学学科主题演化案例[J]. 图书情报工作, 2016,60(13):71-81. |
[97] | Wang X, Qiu P, Zhu D, et al. Identification of technology development trends based on subject-action-object analysis: The case of dye-sensitized solar cells[J]. Tech-nological forecasting and social change, 2015,98:24-46. |
[98] | Guo J, Wang X, Li Q, et al. Subject-action-object-based morphology analysis for determining the direction of technological change[J]. Technological Forecasting and Social Change, 2016,105:27-40. |
[99] | 刘玉琴, 汪雪锋, 雷孝平. 基于文本挖掘技术的专利质量评价与实证研究[J]. 计算机工程与应用, 2007(33):12-14. |
[100] | 李欣, 王静静, 杨梓, 黄鲁成. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016,35(03):80-84. |
[101] | 张运良, 徐硕, 朱礼军, 乔晓东. 汉语科技词系统——一种可用于科技信息资源深度内容分析的语义资源[J]. 图书情报工作, 2011,55(04):100-105. |
[102] | Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997,9(8):1735-1780. |
[103] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Advances in neural information processing systems, 2017: 5998-6008. |
[104] | Zhang L, Huang Y, Yang J, et al. Aggregating large-scale databases for PubMed author name disambiguation[J]. Journal of the American Medical Informatics Association, 2021,28(9):1919-1927. |
[105] | Zhang C. Automatic keyword extraction from docu-ments using conditional random fields[J]. Journal of Computational Information Systems, 2008,4(3):1169-1180. |
[106] | Zhou Q, Zhang C, Zhao S X, et al. Measuring book impact based on the multi-granularity online review mining[J]. Scientometrics, 2016,107(3):1435-1455. |
[107] | Weston L, Tshitoyan V, Dagdelen J, et al. Named ent-ity recognition and normalization applied to large-scale information extraction from the materials science literature[J]. Journal of chemical information and modeling, 2019,59(9):3692-3702. |
[108] | 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014,33(09):979-985. |
[109] | 黄永, 陆伟, 程齐凯. 学术文本的结构功能识别——基于章节内容的识别[J]. 情报学报, 2016,35(03):293-300. |
[110] | 黄永, 陆伟, 程齐凯, 桂思思. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016,35(05):530-538. |
[111] | Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Advances in neural information processing systems, 2017: 5998-6008. |
[112] | Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language under-standing[J]. arXiv preprint arXiv: 1810. 04805, 2018. |
[113] | 于丰畅, 程齐凯, 陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021,5(01):140-149. |
[114] | Lu W, Liu Z, Huang Y, et al. How do authors select key-words? A preliminary study of author keyword selection behavior[J]. Journal of Informetrics, 2020,14(4):101066. |
[115] | 王瑞雪, 方婧, 桂思思, 陆伟, 张显. 基于深度学习算法的学术查询意图分类器构建[J]. 图书情报工作, 2021,65(03):93-99. |
[116] | Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language unders-tanding[J]. arXiv preprint arXiv: 1810. 04805, 2018. |
[117] | Xia P, Wu S, Van Durme B. Which* BERT? A survey organizing contextualized encoders[J]. arXiv preprint arXiv: 2010. 00854, 2020. |
[118] | Sun Y, Wang S, Li Y, et al. Ernie: Enhanced repre-sentation through knowledge integration[J]. arXiv pre-print arXiv: 1904. 09223, 2019. |
[119] | Sun Y, Wang S, Li Y, et al. Ernie 2.0: A continual pre-training framework for language understanding[J]. arXiv preprint arXiv: 1907. 12412 |
[120] | Floridi L, Chiriatti M. GPT-3: Its nature, scope, limits, and consequences[J]. Minds and Machines, 2020,30(4):681-694. |
[121] | Lin J, Yu Y, Zhou Y, et al. How many preprints have ac-tually been printed and why: a case study of computer science preprints on arXiv[J]. Scientometrics, 2020,124(1):555-574. |
[122] | McKiernan, Gerry. arXiv.org: The Los Alamos National Laboratory e-print server[J]. International Journal on Grey Literature, 2000,1(3):127-138. |
[123] | Wu Z, Pan S, Chen F, et al. A comprehensive survey on graph neural networks[J]. IEEE transactions on neural networks and learning systems, 2020,32(1):4-24. |
[124] | Kozlowski D, Dusdal J, Pang J, et al. Semantic and rela-tional spaces in science of science: deep learning models for article vectorisation[J]. Scientometrics, 2021,126(7):5881. |
[125] | Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv: 1609. 02907, 2016. |
[126] | Jeong C, Jang S, Park E, et al. A context-aware citation recommendation model with BERT and graph convolu-tional networks[J]. Scientometrics, 2020,124(3):1907-1922. |
[127] | Zou X. A survey on application of knowledge graph[C]//Journal of Physics: Conference Series. IOP Publishing, 2020,1487(1):012016. |
[128] | Wang K, Shen Z, Huang C, et al. Microsoft academic graph: When experts are not enough[J]. Quantitative Science Studies, 2020,1(1):396-413. |
[129] | Open Academic Graph[EB/OL]. https://www.micro-soft.com/en-us/research/project/open-academic-graph/. |
[130] | Wang R, Yan Y, Wang J, et al. Acekg: A large-scale knowledge graph for academic data mining[J]. arXiv preprint arXiv: 1807. 08484. |
[131] | Tosi M D L, dos Reis J C. SciKGraph: A knowledge gr-aph approach to structure a scientific field[J]. Journal of Informetrics, 2021,15(1):101109. |
[132] | Pearl J. Causal inference in statistics: An overview[J]. Statistics surveys, 2009,3:96-146. |
[133] | Pearl J, Mackenzie D. The book of why: the new science of cause and effect[M]. Basic books, 2018: 349-360. |
[134] | Wang L, Richardson T S, Zhou X H. Causal analysis of ordinal treatments and binary outcomes under truncation by death[J]. Journal of the Royal Statistical Society. Series B, Statistical methodology, 2017,79(3):719. |
[135] | Varian H R. Causal inference in economics and marke-ting[J]. Proceedings of the National Academy of Sciences, 2016,113(27):7310-7315. |
[136] | Schölkopf B. Causality for machine learning[J]. arXiv preprint arXiv: 1911. 10500, 2019. |
[137] | Hoel E P, Albantakis L, Tononi G. Quantifying causal emergence shows that macro can beat micro[J]. Procee-dings of the National Academy of Sciences, 2013,110(49):19790-19795. |
[138] | Schölkopf B, Locatello F, Bauer S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021,109(5):612-634. |
[139] | Zhao Z, Bu Y, Kang L, et al. An investigation of the relationship between scientists’ mobility to/from China and their research performance[J]. Journal of Informe-trics, 2020,14(2):101037. |
[140] | Wang Y, Jones B F, Wang D. Early-career setback and future career impact[J]. Nature communications, 2019,10(1):1-10. |
[141] | Huo D, Dang J, Motohashi K. Empirical Analysis of License Policy for Declared Standard-essential Patents in Setting Technology Standards[M]. RIETI, 2019: 14-16. |
[142] | 张艳丰, 彭丽徽, 刘金承, 洪闯. 新媒体环境下移动社交媒体倦怠用户画像实证研究——基于SSO理论的因果关系视角[J]. 情报学报, 2019,38(10):1092-1101. |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||