Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (1): 41-54.
CSTR: 32002.14.jfdc.CN10-1649/TP.2023.01.004
doi: 10.11871/jfdc.issn.2096-742X.2023.01.004
• Special Issue: Resources, Technology and Policy of Scientific Data • Previous Articles Next Articles
					
													FAN Shaoping1(
),ZHANG Zhiqiang2,3,*(
)
												  
						
						
						
					
				
Received:2022-03-14
															
							
															
							
															
							
																	Online:2023-02-20
															
							
																	Published:2023-02-20
															
						Contact:
								ZHANG Zhiqiang   
																	E-mail:fan.shaoping@imicams.ac.cn;zhangzq@clas.ac.cn
																					FAN Shaoping,ZHANG Zhiqiang. The Development and Prospect of Biomedical Informatics Driven by Data and Technology[J]. Frontiers of Data and Computing, 2023, 5(1): 41-54, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2023.01.004.
Table 1
Definition of discipline scope, theory and method of Biomedical Informatics [14]"
| 界定 | 内容 | 
|---|---|
| 学科范围 | 研究并支持从分子到个体乃至人群,从生物到社会系统的推理、建模、模拟、实验和翻译,将基础和临床研究与实践同医疗卫生企业联系起来。 | 
| 理论方法 | 开发、研究和应用理论、方法和过程以生成、存储、检索、使用、管理和共享生物医学数据、信息和知识。 | 
| 技术方法 | 建立在计算机、通信和信息科学与技术基础上并对其做出贡献,强调它们在生物医学中的应用。 | 
| 社会背景 | 认识到人类是生物医学信息学的最终用户,利用社会和行为科学,为技术解决方案、政策设计和评估以及经济、伦理、社会、教育和组织系统的演变提供信息。 | 
Table 2
Keywords of highly cited literatures in Biomedical Infor-matics"
| 序号 | 类团名称 | 主要关键词 | 
|---|---|---|
| 1 | 癌症研究的生物医学信息学 | 表达、肿瘤、小RNA、乳腺癌、长链非编码RNA、转移、突变、细胞、生物标志物、激活、肝细胞癌、信使RNA、RNA、特异性、甲基化等 | 
| 2 | 组学数据库等资源研究 | 数据库、识别、发现、网络、资源、基因组学、全基因组关联、蛋白质组学、代谢组学、转录组学等 | 
| 3 | 算法与模型研究 | 蛋白质、预测、机器学习、算法、分类、模型、2019冠状病毒疾病、多序列比对、SARS冠状病毒2型、大数据、神经网络等 | 
| 4 | 分析工具/软件研究 | 序列、工具、基因组、对齐、宏基因组学、软件、网络服务器、遗传学、搜索等 | 
Table 3
Descriptions of new databases related to COVID-19[18]"
| 序号 | 数据库名称 | URL | 
|---|---|---|
| 1 | COVID19db | http://www.biomedical-web.com/cov-id19db or  | 
| 2 | Ensembl COVID-19 resource | https://covid-19.ensembl.org | 
| 3 | ESC | http://clingen.igib.res.in/esc | 
| 4 | SCoV2-MD | http://www.scov2-md.org | 
| 5 | SCovid | http://bio-annotation.cn/scovid | 
| 6 | T-cell COVID-19 Atlas | https://t-cov.hse.ru | 
| 7 | VarEPS | https://nmdc.cn/ncovn | 
Table 4
The progress of biomedical text mining"
| 任务 | 模型/方法 | 语料库 | 最佳F1值 | 
|---|---|---|---|
| 命名实体识别 | BioBERT[ |  NCBI Disease | 89.71% | 
| 2010 i2b2/VA | 86.73% | ||
| BC5CDR(Disease) | 87.15% | ||
| BC5CDR(Drug/Chem) | 93.47% | ||
| BC4CHEMD | 92.36% | ||
| BC2GM | 84.72% | ||
| JNLPBA | 77.49% | ||
| LINNAEUS | 88.24% | ||
| Species-800 | 74.06% | ||
| BioBERT+Attention-based BiLSTM-CRF[ |  BC5CDR(Disease) | 88.34% | |
| NCBI Disease | 91.23% | ||
| BC5CDR(Drug/Chem) | 94.23% | ||
| BC4CHEMD | 92.28% | ||
| JNLPBA | 79.97% | ||
| BC2GM | 86.05% | ||
| BIOKMNER[ |  BC2GM | 85.29% | |
| JNLPBA | 77.83% | ||
| BC5CDR(Drug/Chem) | 94.22% | ||
| NCBI Disease | 89.63% | ||
| LINNAEUS | 89.24% | ||
| Species-800 | 76.33% | ||
| 文本分类 | MeSHProbeNet[ |  2018 BioASQ | 68.80% | 
| FullMeSH [ |  PMC | 66.76% | |
| BERTMeSH[ |  PMC | 69.20% | |
| 关系抽取 | KCN[ |  BioCreative V CDR | 71.28% | 
| KGAGN[ |  BioCreative V CDR | 73.30% | |
| PACNN+RL[ |  Prevent | 55.92% | |
| Treat | 66.66% | ||
| DDI corpus | 38.38% | ||
| AIMed | 44.72% | ||
| BioInfer | 52.09% | ||
| HPRD50 | 65.40% | ||
| IEPA | 65.54% | ||
| LLL | 67.09% | ||
| 通路提取 | 远程监督方法[ |  PubMed | 精度为25% | 
| 预测 | DPDDI[ |  DB2 | 84.00% | 
| NNPS[ |  TWOSIDES | 93.60% | |
| Node similarity-based-NN [ |  Drug Bank DDI | AUC为0.933±0.003 | |
| CTD DDA | AUC为0.950±0.004 | ||
| NDFRT DDA | AUC为0.943±0.004 | ||
| STRING PPI | AUC为0.952±0.004 | 
Table 5
New biomedical data analysis tools"
| 任务 | 工具 | URL | 
|---|---|---|
| 蛋白质结构/功能预测工具 | GalaxyHeteromer | http://galaxy.seoklab.org/heteromer | 
| DeepGOWeb | https://deepgo.cbrc.kaust.edu.sa/deepgo/ | |
| 多组学分析工具 | Mergeomics | http://mergeomics.research.idre.ucla.edu/ | 
| iNetModels | https://inetmodels.com | |
| 药物分析工具 | DrugComb | https://drugcomb.org/ | 
| LigAdvisor | https://ligadvisor.unimore.it/ | 
| [1] |  
											 Embi P J, Kaufman S E, Payne P R. Biomedical Inform-atics and Outcomes Research: Enabling Knowledge-driven Healthcare[J]. Circulation, 2009, 120(23):2393-2399. 
																							 doi: 10.1161/CIRCULATIONAHA.108.795526  | 
										
| [2] |  
											 Sarkar I N. Biomedical Informatics and Translational Medicine[J]. Journal of Translational Medicine, 2010, 8(1):22. 
																							 doi: 10.1186/1479-5876-8-22  | 
										
| [3] |  
											 Leinonen R, Sugawara H, Shumway M. International Nucleotide Sequence Database Collaboration. The seque-nce read archive[J]. Nucleic Acids Research, 2011, 39: 19-21. 
																							 doi: 10.1093/nar/gkq1019 pmid: 21062823  | 
										
| [4] | Brown G R, Hem V, Katz K S, et al. Gene: a gene-cent-ered information resource at NCBI[J]. Nucleic Acids Research, 2015, 43: 36-42. | 
| [5] | Kim S, Thiessen P A, Cheng T, et al. Literature infor-mation in PubChem: associations between PubChem records and scientific articles[J]. Journal of Chemin-formatics, 2016, 8: 32. | 
| [6] |  
											 Wishart D S, Feunang Y D, Guo A C, et al. DrugBank 5.0: a major update to the DrugBank database for 2018[J]. Nucleic Acids Research, 2018, 46(D1): 1074-1082. 
																							 doi: 10.1093/nar/gkx1037 pmid: 29126136  | 
										
| [7] | Wu A, Y Peng, Huang B, et al. Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China[J]. Cell Host & Microbe, 2020, 27(3): 325-328. | 
| [8] |  
											 Lu I N, Muller C P, He F Q. Applying next-generation sequencing to unravel the mutational landscape in viral quasispecies[J]. Virus Research, 2020, 283: 197963. 
																							 doi: 10.1016/j.virusres.2020.197963  | 
										
| [9] | Ramírez J D, Muñoz M, Hernández C, et al. Genetic Di-versity Among SARS-CoV2 Strains in South America may Impact Performance of Molecular Detection[J]. Pat-hogens, 2020, 9(7): 580. | 
| [10] | Prasanth D S N B K, Murahari M, Chandramohan V, et al. In silico identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2[J]. Journal of Biomolecular Structure & Dynamics, 2021, 39(13): 4618-4632. | 
| [11] |  
											 Ray M, Sable MN, Sarkar S, et al. Essential interpre-tations of bioinformatics in covid-19 pandemic[J]. Meta Gene, 2021, 27:100844. 
																							 doi: 10.1016/j.mgene.2020.100844  | 
										
| [12] | Wen A, Fu S, Moon S, et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Cli-nic NLP-as-a-service implementation[J]. npj Digital Med-icine, 2019, 2:130. | 
| [13] |  
											 Lucila O M. NIH’s Big Data to Knowledge initiative and the advancement of biomedical informatics[J]. Journal of the American Medical Informatics Association, 2014, 21(2): 193. 
																							 doi: 10.1136/amiajnl-2014-002666  | 
										
| [14] | Kulikowski C A, Shortliffe E H, Currie L M, et al. AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate edu-cation in the discipline[J]. Journal of the American Med-ical Informatics Association, 2012, 19(6): 931-938. | 
| [15] | Shortliffe E H, 罗述谦译. 生物医学信息学[M]. 北京: 科学出版社, 2011: 79. | 
| [16] | 刘壮, 张悦. 统计学方法在生物信息学分析中的应用[J]. 医学信息学杂志, 2020, 41(6):20-23. | 
| [17] | 张志强, 范少萍, 陈秀娟. 面向精准医学知识发现的生物医学信息学发展[J]. 数据分析与知识发现, 2018, 2(1): 1-8. | 
| [18] |  
											 Rigden D J, Fernández X M. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection[J]. Nucleic Acids Research, 2022, 50(D1): D1-D10. 
																							 doi: 10.1093/nar/gkab1195 pmid: 34986604  | 
										
| [19] |  
											 Sayers E W, Bolton E E, Brister J R, et al. Database res-ources of the national center for biotechnology infor-mation[J]. Nucleic Acids Research, 2022, 50(D1): D20-D26. 
																							 doi: 10.1093/nar/gkab1112  | 
										
| [20] |  
											 Cantelli G, Bateman A, Brooksbank C, et al. The Eur-opean Bioinformatics Institute (EMBL-EBI) in 2021[J]. Nucleic Acids Research, 2022, 50(D1): D11-D19. 
																							 doi: 10.1093/nar/gkab1127  | 
										
| [21] |  
											 Okido T, Kodama Y, Mashima J, et al. DNA Data Bank of Japan (DDBJ) update report 2021[J]. Nucleic Acids Research, 2022, 50(D1): D102-D105. 
																							 doi: 10.1093/nar/gkab995  | 
										
| [22] |  
											 CNCB-NGDC Members and Partners. Database Resou-rces of the National Genomics Data Center, China Nat-ional Center for Bioinformation in 2022[J]. Nucleic Acids Research, 2022, 50(D1): D27-D38. 
																							 doi: 10.1093/nar/gkab951  | 
										
| [23] |  
											 Zhao S, Su C, Lu Z, et al. Recent advances in biomedical literature mining[J]. Brief in Bioinformatics. 2021, 22(3): bbaa057. 
																							 doi: 10.1093/bib/bbaa057  | 
										
| [24] |  
											 Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained bio-medical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240 
																							 doi: 10.1093/bioinformatics/btz682  | 
										
| [25] | Naseem U, Musial K, Eklund P, et al. Biomedical named-entity recognition by hierarchically fusing biobert repre-sentations and deep contextual-level word-embedding[C]. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, 2020:1-8. | 
| [26] |  
											 Tian Y, Shen W, Song Y, et al. Improving biomedical named entity recognition with syntactic information[J]. BMC Bioinformatics, 2020, 21(1):539. 
																							 doi: 10.1186/s12859-020-03834-6 pmid: 33238875  | 
										
| [27] |  
											 Xun G, Jha K, Yuan Y, et al. MeSHProbeNet: a self-attentive probe net for MeSH indexing[J]. Bioinformatics, 2019, 35(19): 3794-3802. 
																							 doi: 10.1093/bioinformatics/btz142 pmid: 30851089  | 
										
| [28] |  
											 Dai S, You R, Lu Z, et al. FullMeSH: improving large-scale MeSH indexing with full text[J]. Bioinformatics, 2020, 36(5):1533-1541. 
																							 doi: 10.1093/bioinformatics/btz756 pmid: 31596475  | 
										
| [29] | You R, Liu Y, Mamitsuka H, et al. BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text[J]. Bioin-formatics, 2021, 37(5): 684-692. | 
| [30] |  
											 Zhou H, Lang C, Liu Z, et al. Knowledge-guided con-volutional networks for chemical-disease relation extra-ction[J]. BMC Bioinformatics, 2019, 20(1):260. 
																							 doi: 10.1186/s12859-019-2873-7  | 
										
| [31] |  
											 Sun Y, Wang J, Lin H, et al. Knowledge Guided Attention and Graph Convolutional Networks for Chemical-Disease Relation Extraction[J/OL]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, https://ieeexplore.ieee.org/document/9663570. doi:10.1109/TCBB.2021.3135844. 
																							 doi: 10.1109/TCBB.2021.3135844  | 
										
| [32] | Zhu T, Qin Y, Xiang Y, et al. Distantly supervised bio-medical relation extraction using piecewise attentive convolutional neural network and reinforcement learning[J]. Journal of the American Medical Informatics Asso-ciation, 2021, 28(12): 2571-2581. | 
| [33] | Poon H, Toutanova K, Quirk C. Distant supervision for cancer pathway extraction from text[C]. Pacific Sym-posium on Biocomputing 2015, Kohala Coast, Hawaii, USA, 2015: 120-131. | 
| [34] |  
											 Feng YH, Zhang SW, Shi JY. DPDDI: a deep predictor for drug-drug interactions[J]. BMC Bioinformatics, 2020, 21(1):419. 
																							 doi: 10.1186/s12859-020-03724-x  | 
										
| [35] |  
											 Masumshah R, Aghdam R, Eslahchi C. A neural net-work-based method for polypharmacy side effects pred-iction[J]. BMC Bioinformatics, 2021, 22(1):385. 
																							 doi: 10.1186/s12859-021-04298-y pmid: 34303360  | 
										
| [36] |  
											 Coşkun M, Koyutürk M. Node similarity-based graph convolution for link prediction in biological networks[J]. Bioinformatics, 2021, 37(23):4501-4508. 
																							 doi: 10.1093/bioinformatics/btab464  | 
										
| [37] |  
											 Boratyn GM, Camacho C, Cooper PS, et al. BLAST: a more efficient report with usability improvements[J]. Nucleic Acids Research, 2013, 41(W1): W29-W33. 
																							 doi: 10.1093/nar/gkt282  | 
										
| [38] |  
											 Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, et al. Magic-BLAST, an accurate RNA-seq aligner for long and short reads[J]. BMC Bioinformatics, 2019, 20(1):405. 
																							 doi: 10.1186/s12859-019-2996-x pmid: 31345161  | 
										
| [39] |  
											 Taeyong P, Jonghun W, Minkyung B, et al. Galaxy-Heteromer: protein heterodimer structure pre-diction by template-based and ab initio docking[J]. Nucleic Acids Research, 2021, 49(W1): W237-W241. 
																							 doi: 10.1093/nar/gkab422  | 
										
| [40] |  
											 Maxat, K, Fernando, Z C, Robert, H. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web[J]. Nucleic Acids Research, 2021, 49(W1): W140-W146. 
																							 doi: 10.1093/nar/gkab373  | 
										
| [41] |  
											 Ding J, Montgomery B, Thien N, et al. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics[J]. Nucleic Acids Research, 2021, 49(W1): W375-W387. 
																							 doi: 10.1093/nar/gkab405 pmid: 34048577  | 
										
| [42] |  
											 Arif M, Zhang C, Li X Y, et al. iNetModels 2.0: an interactive visualization and database of multi-omics data[J]. Nucleic Acids Research, 2021, 49(W1): W271-W276. 
																							 doi: 10.1093/nar/gkab254 pmid: 33849075  | 
										
| [43] |  
											 Zheng S, Aldahdooh J, Shadbahr T, et al. DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal[J]. Nucleic Acids Research, 2021, 49(W1): W174-W184. 
																							 doi: 10.1093/nar/gkab438 pmid: 34060634  | 
										
| [44] |  
											 Luca P, Annachiara T, Luca G, et al. LigAdvisor: a versatile and user-friendly web-platform for drug design[J]. Nucleic Acids Research, 2021, 49(W1): W326-W335. 
																							 doi: 10.1093/nar/gkab385  | 
										
| [45] | 胡正银, 刘蕾蕾, 代冰, 覃筱楚. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11): 1-14. | 
| [46] | 王安然, 吴思竹. 美国癌症数据标准注册存储库的实践与启示[J]. 中华医学图书情报杂志, 2020, 29(10): 15-23. | 
| [47] | 李伟, 孙学会, 徐萍, 等. 美国All of Us队列项目建设模式与特点分析[J]. 世界科技研究与发展, 2022, 44(2):265-274. | 
| [48] |  
											 Wilkinson M, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship[J]. Scientific Data, 2016, 3: 160018. 
																							 doi: 10.1038/sdata.2016.18  | 
										
| [49] | 樊代明. 生物医学大数据是重要战略资源[J]. 科学新闻, 2019, (06):34. | 
| [50] | 张国庆, 李亦学, 王泽峰, 赵国屏. 生物医学大数据发展的新挑战与趋势[J]. 中国科学院院刊, 2018, 33(8): 853-860. | 
| [51] | Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Der-matologist-level classification of skin cancer with deep neural networks[J]. Nature, 2017, 546(7660): 686. | 
| [52] | Amoroso N, Diacono D, Fanizzi A, et al. Deep learning reveals Alzheimer’s disease onset in MCI subjects: res-ults from an international challenge[J]. Journal of Neur-oscience Methods, 2018, 302: 3-9. | 
| [53] |  
											 Li H, Zhu L, Shen M, et al. Blockchain-Based Data Pre-servation System for Medical Data[J]. Journal of Medical Systems, 2018, 42(8): 141. 
																							 doi: 10.1007/s10916-018-0997-3  | 
										
| [54] | 钱庆, 薛伟. 医学信息学发展现状与展望[J]. 中华医学信息导报, 2020, 35 (18): 10. | 
| [55] | Savova G K, Masanz J J, Ogren P V, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cT-AKES): architecture, component evaluation and app-lications[J]. Journal of the American Medical Inform-atics Association, 2010, 17:507-513. | 
| [56] | 关于生物医学新技术临床应用管理条例(征求意见稿)公开征求意见的公告[EB/OL]. [2022-01-20]. http://www.nhc.gov.cn/yzygj/s7659/201902/0f24ddc242c24212abc42aa8b539584d.shtml. | 
| [1] | HU Zhengyin,LIU Leilei,CHEN Wenjie,LIU Chunjiang,QIAN Li,SONG Yibing. Generating a Hematopoietic Stem Cell Knowledge Graph for Scientific Knowledge Discovery [J]. Frontiers of Data and Computing, 2021, 3(6): 81-97. | 
| [2] | Haitao Zhao, Jiachang Sun, Leisheng Li, Wenhao Yang, Hui Zhao, Huiyuan Li. Research on HPL Parallel Computing Model for a Class of Complex Heterogeneous Supercomputer System [J]. Frontiers of Data and Computing, 2020, 2(1): 85-92. | 
| Viewed | ||||||
| 
										Full text | 
									
										 | 
								|||||
| 
										Abstract | 
									
										 | 
								|||||
