Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (1): 41-54.
CSTR: 32002.14.jfdc.CN10-1649/TP.2023.01.004
doi: 10.11871/jfdc.issn.2096-742X.2023.01.004
• Special Issue: Resources, Technology and Policy of Scientific Data • Previous Articles Next Articles
FAN Shaoping1(),ZHANG Zhiqiang2,3,*(
)
Received:
2022-03-14
Online:
2023-02-20
Published:
2023-02-20
Contact:
ZHANG Zhiqiang
E-mail:fan.shaoping@imicams.ac.cn;zhangzq@clas.ac.cn
FAN Shaoping,ZHANG Zhiqiang. The Development and Prospect of Biomedical Informatics Driven by Data and Technology[J]. Frontiers of Data and Computing, 2023, 5(1): 41-54, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2023.01.004.
Table 1
Definition of discipline scope, theory and method of Biomedical Informatics [14]"
界定 | 内容 |
---|---|
学科范围 | 研究并支持从分子到个体乃至人群,从生物到社会系统的推理、建模、模拟、实验和翻译,将基础和临床研究与实践同医疗卫生企业联系起来。 |
理论方法 | 开发、研究和应用理论、方法和过程以生成、存储、检索、使用、管理和共享生物医学数据、信息和知识。 |
技术方法 | 建立在计算机、通信和信息科学与技术基础上并对其做出贡献,强调它们在生物医学中的应用。 |
社会背景 | 认识到人类是生物医学信息学的最终用户,利用社会和行为科学,为技术解决方案、政策设计和评估以及经济、伦理、社会、教育和组织系统的演变提供信息。 |
Table 2
Keywords of highly cited literatures in Biomedical Infor-matics"
序号 | 类团名称 | 主要关键词 |
---|---|---|
1 | 癌症研究的生物医学信息学 | 表达、肿瘤、小RNA、乳腺癌、长链非编码RNA、转移、突变、细胞、生物标志物、激活、肝细胞癌、信使RNA、RNA、特异性、甲基化等 |
2 | 组学数据库等资源研究 | 数据库、识别、发现、网络、资源、基因组学、全基因组关联、蛋白质组学、代谢组学、转录组学等 |
3 | 算法与模型研究 | 蛋白质、预测、机器学习、算法、分类、模型、2019冠状病毒疾病、多序列比对、SARS冠状病毒2型、大数据、神经网络等 |
4 | 分析工具/软件研究 | 序列、工具、基因组、对齐、宏基因组学、软件、网络服务器、遗传学、搜索等 |
Table 3
Descriptions of new databases related to COVID-19[18]"
序号 | 数据库名称 | URL |
---|---|---|
1 | COVID19db | http://www.biomedical-web.com/cov-id19db or |
2 | Ensembl COVID-19 resource | https://covid-19.ensembl.org |
3 | ESC | http://clingen.igib.res.in/esc |
4 | SCoV2-MD | http://www.scov2-md.org |
5 | SCovid | http://bio-annotation.cn/scovid |
6 | T-cell COVID-19 Atlas | https://t-cov.hse.ru |
7 | VarEPS | https://nmdc.cn/ncovn |
Table 4
The progress of biomedical text mining"
任务 | 模型/方法 | 语料库 | 最佳F1值 |
---|---|---|---|
命名实体识别 | BioBERT[ | NCBI Disease | 89.71% |
2010 i2b2/VA | 86.73% | ||
BC5CDR(Disease) | 87.15% | ||
BC5CDR(Drug/Chem) | 93.47% | ||
BC4CHEMD | 92.36% | ||
BC2GM | 84.72% | ||
JNLPBA | 77.49% | ||
LINNAEUS | 88.24% | ||
Species-800 | 74.06% | ||
BioBERT+Attention-based BiLSTM-CRF[ | BC5CDR(Disease) | 88.34% | |
NCBI Disease | 91.23% | ||
BC5CDR(Drug/Chem) | 94.23% | ||
BC4CHEMD | 92.28% | ||
JNLPBA | 79.97% | ||
BC2GM | 86.05% | ||
BIOKMNER[ | BC2GM | 85.29% | |
JNLPBA | 77.83% | ||
BC5CDR(Drug/Chem) | 94.22% | ||
NCBI Disease | 89.63% | ||
LINNAEUS | 89.24% | ||
Species-800 | 76.33% | ||
文本分类 | MeSHProbeNet[ | 2018 BioASQ | 68.80% |
FullMeSH [ | PMC | 66.76% | |
BERTMeSH[ | PMC | 69.20% | |
关系抽取 | KCN[ | BioCreative V CDR | 71.28% |
KGAGN[ | BioCreative V CDR | 73.30% | |
PACNN+RL[ | Prevent | 55.92% | |
Treat | 66.66% | ||
DDI corpus | 38.38% | ||
AIMed | 44.72% | ||
BioInfer | 52.09% | ||
HPRD50 | 65.40% | ||
IEPA | 65.54% | ||
LLL | 67.09% | ||
通路提取 | 远程监督方法[ | PubMed | 精度为25% |
预测 | DPDDI[ | DB2 | 84.00% |
NNPS[ | TWOSIDES | 93.60% | |
Node similarity-based-NN [ | Drug Bank DDI | AUC为0.933±0.003 | |
CTD DDA | AUC为0.950±0.004 | ||
NDFRT DDA | AUC为0.943±0.004 | ||
STRING PPI | AUC为0.952±0.004 |
Table 5
New biomedical data analysis tools"
任务 | 工具 | URL |
---|---|---|
蛋白质结构/功能预测工具 | GalaxyHeteromer | http://galaxy.seoklab.org/heteromer |
DeepGOWeb | https://deepgo.cbrc.kaust.edu.sa/deepgo/ | |
多组学分析工具 | Mergeomics | http://mergeomics.research.idre.ucla.edu/ |
iNetModels | https://inetmodels.com | |
药物分析工具 | DrugComb | https://drugcomb.org/ |
LigAdvisor | https://ligadvisor.unimore.it/ |
[1] |
Embi P J, Kaufman S E, Payne P R. Biomedical Inform-atics and Outcomes Research: Enabling Knowledge-driven Healthcare[J]. Circulation, 2009, 120(23):2393-2399.
doi: 10.1161/CIRCULATIONAHA.108.795526 |
[2] |
Sarkar I N. Biomedical Informatics and Translational Medicine[J]. Journal of Translational Medicine, 2010, 8(1):22.
doi: 10.1186/1479-5876-8-22 |
[3] |
Leinonen R, Sugawara H, Shumway M. International Nucleotide Sequence Database Collaboration. The seque-nce read archive[J]. Nucleic Acids Research, 2011, 39: 19-21.
doi: 10.1093/nar/gkq1019 pmid: 21062823 |
[4] | Brown G R, Hem V, Katz K S, et al. Gene: a gene-cent-ered information resource at NCBI[J]. Nucleic Acids Research, 2015, 43: 36-42. |
[5] | Kim S, Thiessen P A, Cheng T, et al. Literature infor-mation in PubChem: associations between PubChem records and scientific articles[J]. Journal of Chemin-formatics, 2016, 8: 32. |
[6] |
Wishart D S, Feunang Y D, Guo A C, et al. DrugBank 5.0: a major update to the DrugBank database for 2018[J]. Nucleic Acids Research, 2018, 46(D1): 1074-1082.
doi: 10.1093/nar/gkx1037 pmid: 29126136 |
[7] | Wu A, Y Peng, Huang B, et al. Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China[J]. Cell Host & Microbe, 2020, 27(3): 325-328. |
[8] |
Lu I N, Muller C P, He F Q. Applying next-generation sequencing to unravel the mutational landscape in viral quasispecies[J]. Virus Research, 2020, 283: 197963.
doi: 10.1016/j.virusres.2020.197963 |
[9] | Ramírez J D, Muñoz M, Hernández C, et al. Genetic Di-versity Among SARS-CoV2 Strains in South America may Impact Performance of Molecular Detection[J]. Pat-hogens, 2020, 9(7): 580. |
[10] | Prasanth D S N B K, Murahari M, Chandramohan V, et al. In silico identification of potential inhibitors from Cinnamon against main protease and spike glycoprotein of SARS CoV-2[J]. Journal of Biomolecular Structure & Dynamics, 2021, 39(13): 4618-4632. |
[11] |
Ray M, Sable MN, Sarkar S, et al. Essential interpre-tations of bioinformatics in covid-19 pandemic[J]. Meta Gene, 2021, 27:100844.
doi: 10.1016/j.mgene.2020.100844 |
[12] | Wen A, Fu S, Moon S, et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Cli-nic NLP-as-a-service implementation[J]. npj Digital Med-icine, 2019, 2:130. |
[13] |
Lucila O M. NIH’s Big Data to Knowledge initiative and the advancement of biomedical informatics[J]. Journal of the American Medical Informatics Association, 2014, 21(2): 193.
doi: 10.1136/amiajnl-2014-002666 |
[14] | Kulikowski C A, Shortliffe E H, Currie L M, et al. AMIA Board white paper: definition of biomedical informatics and specification of core competencies for graduate edu-cation in the discipline[J]. Journal of the American Med-ical Informatics Association, 2012, 19(6): 931-938. |
[15] | Shortliffe E H, 罗述谦译. 生物医学信息学[M]. 北京: 科学出版社, 2011: 79. |
[16] | 刘壮, 张悦. 统计学方法在生物信息学分析中的应用[J]. 医学信息学杂志, 2020, 41(6):20-23. |
[17] | 张志强, 范少萍, 陈秀娟. 面向精准医学知识发现的生物医学信息学发展[J]. 数据分析与知识发现, 2018, 2(1): 1-8. |
[18] |
Rigden D J, Fernández X M. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection[J]. Nucleic Acids Research, 2022, 50(D1): D1-D10.
doi: 10.1093/nar/gkab1195 pmid: 34986604 |
[19] |
Sayers E W, Bolton E E, Brister J R, et al. Database res-ources of the national center for biotechnology infor-mation[J]. Nucleic Acids Research, 2022, 50(D1): D20-D26.
doi: 10.1093/nar/gkab1112 |
[20] |
Cantelli G, Bateman A, Brooksbank C, et al. The Eur-opean Bioinformatics Institute (EMBL-EBI) in 2021[J]. Nucleic Acids Research, 2022, 50(D1): D11-D19.
doi: 10.1093/nar/gkab1127 |
[21] |
Okido T, Kodama Y, Mashima J, et al. DNA Data Bank of Japan (DDBJ) update report 2021[J]. Nucleic Acids Research, 2022, 50(D1): D102-D105.
doi: 10.1093/nar/gkab995 |
[22] |
CNCB-NGDC Members and Partners. Database Resou-rces of the National Genomics Data Center, China Nat-ional Center for Bioinformation in 2022[J]. Nucleic Acids Research, 2022, 50(D1): D27-D38.
doi: 10.1093/nar/gkab951 |
[23] |
Zhao S, Su C, Lu Z, et al. Recent advances in biomedical literature mining[J]. Brief in Bioinformatics. 2021, 22(3): bbaa057.
doi: 10.1093/bib/bbaa057 |
[24] |
Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained bio-medical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240
doi: 10.1093/bioinformatics/btz682 |
[25] | Naseem U, Musial K, Eklund P, et al. Biomedical named-entity recognition by hierarchically fusing biobert repre-sentations and deep contextual-level word-embedding[C]. 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, 2020:1-8. |
[26] |
Tian Y, Shen W, Song Y, et al. Improving biomedical named entity recognition with syntactic information[J]. BMC Bioinformatics, 2020, 21(1):539.
doi: 10.1186/s12859-020-03834-6 pmid: 33238875 |
[27] |
Xun G, Jha K, Yuan Y, et al. MeSHProbeNet: a self-attentive probe net for MeSH indexing[J]. Bioinformatics, 2019, 35(19): 3794-3802.
doi: 10.1093/bioinformatics/btz142 pmid: 30851089 |
[28] |
Dai S, You R, Lu Z, et al. FullMeSH: improving large-scale MeSH indexing with full text[J]. Bioinformatics, 2020, 36(5):1533-1541.
doi: 10.1093/bioinformatics/btz756 pmid: 31596475 |
[29] | You R, Liu Y, Mamitsuka H, et al. BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text[J]. Bioin-formatics, 2021, 37(5): 684-692. |
[30] |
Zhou H, Lang C, Liu Z, et al. Knowledge-guided con-volutional networks for chemical-disease relation extra-ction[J]. BMC Bioinformatics, 2019, 20(1):260.
doi: 10.1186/s12859-019-2873-7 |
[31] |
Sun Y, Wang J, Lin H, et al. Knowledge Guided Attention and Graph Convolutional Networks for Chemical-Disease Relation Extraction[J/OL]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021, https://ieeexplore.ieee.org/document/9663570. doi:10.1109/TCBB.2021.3135844.
doi: 10.1109/TCBB.2021.3135844 |
[32] | Zhu T, Qin Y, Xiang Y, et al. Distantly supervised bio-medical relation extraction using piecewise attentive convolutional neural network and reinforcement learning[J]. Journal of the American Medical Informatics Asso-ciation, 2021, 28(12): 2571-2581. |
[33] | Poon H, Toutanova K, Quirk C. Distant supervision for cancer pathway extraction from text[C]. Pacific Sym-posium on Biocomputing 2015, Kohala Coast, Hawaii, USA, 2015: 120-131. |
[34] |
Feng YH, Zhang SW, Shi JY. DPDDI: a deep predictor for drug-drug interactions[J]. BMC Bioinformatics, 2020, 21(1):419.
doi: 10.1186/s12859-020-03724-x |
[35] |
Masumshah R, Aghdam R, Eslahchi C. A neural net-work-based method for polypharmacy side effects pred-iction[J]. BMC Bioinformatics, 2021, 22(1):385.
doi: 10.1186/s12859-021-04298-y pmid: 34303360 |
[36] |
Coşkun M, Koyutürk M. Node similarity-based graph convolution for link prediction in biological networks[J]. Bioinformatics, 2021, 37(23):4501-4508.
doi: 10.1093/bioinformatics/btab464 |
[37] |
Boratyn GM, Camacho C, Cooper PS, et al. BLAST: a more efficient report with usability improvements[J]. Nucleic Acids Research, 2013, 41(W1): W29-W33.
doi: 10.1093/nar/gkt282 |
[38] |
Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, et al. Magic-BLAST, an accurate RNA-seq aligner for long and short reads[J]. BMC Bioinformatics, 2019, 20(1):405.
doi: 10.1186/s12859-019-2996-x pmid: 31345161 |
[39] |
Taeyong P, Jonghun W, Minkyung B, et al. Galaxy-Heteromer: protein heterodimer structure pre-diction by template-based and ab initio docking[J]. Nucleic Acids Research, 2021, 49(W1): W237-W241.
doi: 10.1093/nar/gkab422 |
[40] |
Maxat, K, Fernando, Z C, Robert, H. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web[J]. Nucleic Acids Research, 2021, 49(W1): W140-W146.
doi: 10.1093/nar/gkab373 |
[41] |
Ding J, Montgomery B, Thien N, et al. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics[J]. Nucleic Acids Research, 2021, 49(W1): W375-W387.
doi: 10.1093/nar/gkab405 pmid: 34048577 |
[42] |
Arif M, Zhang C, Li X Y, et al. iNetModels 2.0: an interactive visualization and database of multi-omics data[J]. Nucleic Acids Research, 2021, 49(W1): W271-W276.
doi: 10.1093/nar/gkab254 pmid: 33849075 |
[43] |
Zheng S, Aldahdooh J, Shadbahr T, et al. DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal[J]. Nucleic Acids Research, 2021, 49(W1): W174-W184.
doi: 10.1093/nar/gkab438 pmid: 34060634 |
[44] |
Luca P, Annachiara T, Luca G, et al. LigAdvisor: a versatile and user-friendly web-platform for drug design[J]. Nucleic Acids Research, 2021, 49(W1): W326-W335.
doi: 10.1093/nar/gkab385 |
[45] | 胡正银, 刘蕾蕾, 代冰, 覃筱楚. 基于领域知识图谱的生命医学学科知识发现探析[J]. 数据分析与知识发现, 2020, 4(11): 1-14. |
[46] | 王安然, 吴思竹. 美国癌症数据标准注册存储库的实践与启示[J]. 中华医学图书情报杂志, 2020, 29(10): 15-23. |
[47] | 李伟, 孙学会, 徐萍, 等. 美国All of Us队列项目建设模式与特点分析[J]. 世界科技研究与发展, 2022, 44(2):265-274. |
[48] |
Wilkinson M, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship[J]. Scientific Data, 2016, 3: 160018.
doi: 10.1038/sdata.2016.18 |
[49] | 樊代明. 生物医学大数据是重要战略资源[J]. 科学新闻, 2019, (06):34. |
[50] | 张国庆, 李亦学, 王泽峰, 赵国屏. 生物医学大数据发展的新挑战与趋势[J]. 中国科学院院刊, 2018, 33(8): 853-860. |
[51] | Esteva A, Kuprel B, Novoa R A, et al. Corrigendum: Der-matologist-level classification of skin cancer with deep neural networks[J]. Nature, 2017, 546(7660): 686. |
[52] | Amoroso N, Diacono D, Fanizzi A, et al. Deep learning reveals Alzheimer’s disease onset in MCI subjects: res-ults from an international challenge[J]. Journal of Neur-oscience Methods, 2018, 302: 3-9. |
[53] |
Li H, Zhu L, Shen M, et al. Blockchain-Based Data Pre-servation System for Medical Data[J]. Journal of Medical Systems, 2018, 42(8): 141.
doi: 10.1007/s10916-018-0997-3 |
[54] | 钱庆, 薛伟. 医学信息学发展现状与展望[J]. 中华医学信息导报, 2020, 35 (18): 10. |
[55] | Savova G K, Masanz J J, Ogren P V, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cT-AKES): architecture, component evaluation and app-lications[J]. Journal of the American Medical Inform-atics Association, 2010, 17:507-513. |
[56] | 关于生物医学新技术临床应用管理条例(征求意见稿)公开征求意见的公告[EB/OL]. [2022-01-20]. http://www.nhc.gov.cn/yzygj/s7659/201902/0f24ddc242c24212abc42aa8b539584d.shtml. |
[1] | HU Zhengyin,LIU Leilei,CHEN Wenjie,LIU Chunjiang,QIAN Li,SONG Yibing. Generating a Hematopoietic Stem Cell Knowledge Graph for Scientific Knowledge Discovery [J]. Frontiers of Data and Computing, 2021, 3(6): 81-97. |
[2] | Haitao Zhao, Jiachang Sun, Leisheng Li, Wenhao Yang, Hui Zhao, Huiyuan Li. Research on HPL Parallel Computing Model for a Class of Complex Heterogeneous Supercomputer System [J]. Frontiers of Data and Computing, 2020, 2(1): 85-92. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||