数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (2): 1-19.
doi: 10.11871/jfdc.issn.2096-742X.2020.02.001
所属专题: “数据分析技术与应用”专刊
• 专刊: 数据分析技术与应用 • 下一篇
收稿日期:
2020-01-21
出版日期:
2020-04-20
发布日期:
2020-06-03
通讯作者:
李茹姣,鲍一明
作者简介:
陈梅丽,中国科学院北京基因组研究所(国家生物信息中心),国家基因组科学数据中心,助理研究员,博士,主要从事基因组、转录组等组学数据整合和挖掘工作。基金资助:
Chen Meili1,Ma Yingke1,Li Rujiao1,*(),Bao Yiming1,2,*()
Received:
2020-01-21
Online:
2020-04-20
Published:
2020-06-03
Contact:
Rujiao Li,Yiming Bao
摘要:
【目的】全面阐述基因组学数据分析方法的现状和未来发展趋势,为精准医学、精准育种、生物安全、生物多样性、分子进化等的相关组学数据分析算法的研究与工具开发提供参考。【结果】基因组学数据分析主要包括基因组、转录组、表观组数据分析,当前基因组学数据主要面临着海量、多维、异构等挑战。本文详细地阐述了基因组学数据分析算法和工具开发的现状、应用、存在的问题和面临的挑战。【结论】充分利用人工智能、统计模型、知识图谱等先进技术,不断地优化和开发更先进的算法和更鲁棒的模型,使其兼具高容错、高准确、高效、计算资源低耗等优点,匹配海量、多维、异构基因组学大数据分析的需求,是未来基因组学数据分析算法和工具开发的方向。
陈梅丽,马英克,李茹姣,鲍一明. 基因组学数据分析方法现状和展望[J]. 数据与计算发展前沿, 2020, 2(2): 1-19.
Chen Meili,Ma Yingke,Li Rujiao,Bao Yiming. Current Status and Prospects of Genomics Data Analysis Methods[J]. Frontiers of Data and Computing, 2020, 2(2): 1-19.
[1] | Zhang L, Chen F, Zhang X, Li Z, Zhao Y, Lohaus R, Chang X, Dong W, Ho SYW, Liu X et al: The water lily genome and the early evolution of flowering plants[J]. Nature 2020,577(7788):79-84. |
[2] | Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, Huang J, Deng T, Luo J, He L et al: Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement[J]. Nature Genetics 2019,51(6):1052-1059. |
[3] | He M, Wang J, Fan X, Liu X, Shi W, Huang N, Zhao F, Miao M : Genetic basis for the establishment of endosymbiosis in Paramecium[J]. The ISME journal 2019,13(5):1360-1369. |
[4] | Ruan J, Li H : Fast and accurate long-read assembly with wtdbg 2[J]. Nature Methods 2020,17(2):155-158. |
[5] | Hu H, Mu Q, Bao Z, Chen Y, Liu Y, Chen J, Wang K, Wang Z, Nam Y, Jiang B , et al: Mutational Landscape of Secondary Glioblastoma Guides MET-Targeted Trial in Brain Tumor[J]. Cell 2018, 175(6):1665-1678. e18. |
[6] | Zheng C, Zheng L, Yoo JK, Guo H, Zhang Y, Guo X, Kang B, Hu R, Huang JY, Zhang Q , et al: Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing[J]. Cell 2017, 169(7):1342-1356. e16. |
[7] | Guo F, Yan L, Guo H, Li L, Hu B, Zhao Y, Yong J, Hu Y, Wang X, Wei Y et al: The Transcriptome and DNA Methylome Landscapes of Human Primordial Germ Cells[J]. Cell 2015,161(6):1437-1452. |
[8] | Ledford H : Super-precise new CRISPR tool could tackle a plethora of genetic diseases[J]. Nature 2019,574(7779):464-465. |
[9] | Zhang C, Chen Y, Sun B, Wang L, Yang Y, Ma D, Lv J, Heng J, Ding Y, Xue Y et al: m(6)A modulates haematopoietic stem and progenitor cell specification[J]. Nature 2017,549(7671):273-276. |
[10] | Zhang W, Wan H, Feng G, Qu J, Wang J, Jing Y, Ren R, Liu Z, Zhang L, Chen Z et al: SIRT6 deficiency results in developmental retardation in cynomolgus monkeys[J]. Nature 2018,560(7720):661-665. |
[11] | Deng Y, Zhai K, Xie Z, Yang D, Zhu X, Liu J, Wang X, Qin P, Yang Y, Zhang G et al: Epigenetic regulation of antagonistic receptors confers rice blast resistance with yield balance[J]. Science 2017,355(6328):962-965. |
[12] | Li W, Zhu Z, Chern M, Yin J, Yang C, Ran L, Cheng M, He M, Wang K, Wang J , et al: A Natural Allele of a Transcription Factor in Rice Confers Broad-Spectrum Blast Resistance[J]. Cell 2017, 170(1):114-126. e15. |
[13] | Efremova M, Teichmann SA : Computational methods for single-cell omics across modalities[J]. Nature Methods 2020,17(1):14-17. |
[14] | Rackham OJL, Langley SR, Oates T, Vradi E, Harmston N, Srivastava PK, Behmoaras J, Dellaportas P, Bottolo L, Petretto E : A Bayesian Approach for Analysis of Whole-Genome Bisulfite Sequencing Data Identifies Disease-Associated Changes in DNA Methylation[J]. Genetics 2017,205(4):1443-1458. |
[15] |
Zhang Z, Pan Z, Ying Y, Xie Z, Adhikari S, Phillips J, Carstens RP, Black DL, Wu Y, Xing Y : Deep-learning augmented RNA-seq analysis of transcript splicing[J]. Nature Methods 2019,16(4):307-310.
doi: 10.1038/s41592-019-0351-9 |
[16] | Tomczak K, Czerwinska P, Wiznerowicz M : The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge[J]. Contemporary Oncology 2015,19(1A):A68-77. |
[17] | Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J et al: The UK Biobank resource with deep phenotyping and genomic data[J]. Nature 2018,562(7726):203-209. |
[18] | Majoros WH, Pertea M, Salzberg SL : TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders[J]. Bioinformatics 2004,20(16):2878-2879. |
[19] | Burn J, Watson M : The Human Variome Project[J]. Human Mutation 2016,37(6):505-507. |
[20] | Zhao Y, Yin J, Guo H, Zhang Y, Xiao W, Sun C, Wu J, Qu X, Yu J, Wang X et al: The complete chloroplast genome provides insight into the evolution and polymorphism of Panax ginseng[J]. Frontiers in Plant Science 2014,5:696. |
[21] | Zhang T, Zhang X, Hu S, Yu J : An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform[J]. Plant Methods 2011,7:38. |
[22] | Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y et al: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler[J]. Gigascience 2012,1(1):18. |
[23] | Du Z, Ma L, Qu H, Chen W, Zhang B, Lu X, Zhai W, Sheng X, Sun Y, Li W et al: Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome[J]. Genomics, Proteomics & Bioinformatics 2019,17(3):229-247. |
[24] |
Wang X, Chen M, Xiao J, Hao L, Crowley DE, Zhang Z, Yu J, Huang N, Huo M, Wu J : Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3[J]. PLoS ONE 2015,10(8):e0132881.
doi: 10.1371/journal.pone.0132881 |
[25] |
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J : Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions[J]. Nature Biotechnology 2013,31(12):1119-1125.
doi: 10.1038/nbt.2727 |
[26] | Ghurye J, Rhie A, Walenz BP, Schmitt A, Selvaraj S, Pop M, Phillippy AM, Koren S : Integrating Hi-C links with assembly graphs for chromosome-scale assembly[J]. PLoS Computational Biology 2019,15(8):e1007273. |
[27] |
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP et al: De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds[J]. Science 2017,356(6333):92-95.
doi: 10.1126/science.aal3327 |
[28] |
Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, Campbell MS, Stein JC, Wei X, Chin CS et al: Improved maize reference genome with single-molecule technologies[J]. Nature 2017,546(7659):524-527.
doi: 10.1038/nature22971 |
[29] |
Ribeiro FJ, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A, Berlin AM, Montmayeur A, Shea TP, Walker BJ et al: Finished bacterial genomes from shotgun sequence data[J]. Genome Research 2012,22(11):2270-2277.
doi: 10.1101/gr.141515.112 |
[30] |
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM : Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation[J]. Genome Research 2017,27(5):722-736.
doi: 10.1101/gr.215087.116 |
[31] |
Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A et al: Phased diploid genome assembly with single-molecule real-time sequencing[J]. Nature Methods 2016,13(12):1050-1054.
doi: 10.1038/nmeth.4035 |
[32] |
Kolmogorov M, Yuan J, Lin Y, Pevzner PA : Assembly of long, error-prone reads using repeat graphs[J]. Nature Biotechnology 2019,37(5):540-546.
doi: 10.1038/s41587-019-0072-8 |
[33] |
Zhang X, Zhang S, Zhao Q, Ming R, Tang H : Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data[J]. Nature Plants 2019,5(8):833-845.
doi: 10.1038/s41477-019-0487-8 |
[34] |
Zhang J, Zhang X, Tang H, Zhang Q, Hua X, Ma X, Zhu F, Jones T, Zhu X, Bowers J et al: Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L[J]. Nature Genetics 2018,50(11):1565-1573.
doi: 10.1038/s41588-018-0237-2 |
[35] |
Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM : De novo assembly of haplotype-resolved genomes with trio binning[J]. Nature Biotechnology 2018,36(12):1174-1182.
doi: 10.1038/nbt.4277 |
[36] | Kronenberg ZN, Hall RJ, Hiendleder S, Smith TPL, Sullivan ST, Williams JL, Kingan SB : FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes. bioRxiv 2018. |
[37] |
Duan Z, Qiao Y, Lu J, Lu H, Zhang W, Yan F, Sun C, Hu Z, Zhang Z, Li G et al: HUPAN: a pan-genome analysis pipeline for human genomes[J]. Genome Biology 2019,20(1):149.
doi: 10.1186/s13059-019-1751-y |
[38] |
Besemer J, Lomsadze A, Borodovsky M : GeneMarkS: a self-training method for prediction of gene starts in microbial genomes[J]. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 2001,29(12):2607-2618.
doi: 10.1093/nar/29.12.2607 |
[39] |
Delcher AL, Bratke KA, Powers EC, Salzberg SL : Identifying bacterial genes and endosymbiont DNA with Glimmer[J]. Bioinformatics 2007,23(6):673-679.
doi: 10.1093/bioinformatics/btm009 |
[40] |
Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL : Domain enhanced lookup time accelerated BLAST[J]. Biology Direct 2012,7:12.
doi: 10.1186/1745-6150-7-12 |
[41] |
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M et al: De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis[J]. Nature Protocols 2013,8(8):1494-1512.
doi: 10.1038/nprot.2013.084 |
[42] |
Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G et al: InterProScan 5: genome-scale protein function classification. Bioinformatics 2014,30(9):1236-1240.
doi: 10.1093/bioinformatics/btu031 |
[43] |
Alkan C, Coe BP, Eichler EE : Genome structural variation discovery and genotyping[J]. Nature Reviews Genetics 2011,12(5):363-376.
doi: 10.1038/nrg2958 |
[44] |
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S Daly M , et al: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data[J]. Genome Research 2010,20(9):1297-1303.
doi: 10.1101/gr.107524.110 |
[45] |
Zhou A, Lin T, Xing J : Evaluating nanopore sequencing data processing pipelines for structural variation identification[J]. Genome Biology 2019,20(1):237.
doi: 10.1186/s13059-019-1858-1 |
[46] |
Genetic Modifiers of Huntington’s Disease C: Identification of Genetic Factors that Modify Clinical Onset of Huntington’s Disease[J]. Cell 2015,162(3):516-526.
doi: 10.1016/j.cell.2015.07.003 |
[47] |
Xiao Y, Liu H, Wu L, Warburton M, Yan J : Genome-wide Association Studies in Maize: Praise and Stargaze[J]. Molecular Plant 2017,10(3):359-374.
doi: 10.1016/j.molp.2016.12.008 |
[48] |
Sul JH, Martin LS, Eskin E : Population structure in genetic studies: Confounding factors and mixed models[J]. PLoS Genetics 2018,14(12):e1007309.
doi: 10.1371/journal.pgen.1007309 |
[49] |
Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A et al: Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies[J]. Nature Genetics 2018,50(9):1335-1341.
doi: 10.1038/s41588-018-0184-y |
[50] |
Gong J, Wan H, Mei S, Ruan H, Zhang Z, Liu C, Guo AY, Diao L, Miao X, Han L : Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer[J]. Nucleic Acids Research 2019,47(D1):D1066-D1072.
doi: 10.1093/nar/gky814 |
[51] |
JD S : A direct approach to false discovery rates[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002,64:479-498.
doi: 10.1111/rssb.2002.64.issue-3 |
[52] |
Ongen H, Buil A, Brown AA, Dermitzakis ET, Delaneau O : Fast and efficient QTL mapper for thousands of molecular phenotypes[J]. Bioinformatics 2016,32(10):1479-1485.
doi: 10.1093/bioinformatics/btv722 |
[53] |
Hammond TR, Dufort C, Dissing-Olesen L, Giera S, Young A, Wysoker A, Walker AJ, Gergits F, Segel M, Nemesh J , et al: Single-Cell RNA Sequencing of Microglia throughout the Mouse Lifespan and in the Injured Brain Reveals Complex Cell-State Changes[J]. Immunity 2019, 50(1):253-271. e6.
doi: 10.1016/j.immuni.2018.11.004 |
[54] |
Marco-Puche G, Lois S, Benitez J, Trivino JC : RNA-Seq Perspectives to Improve Clinical Diagnosis[J]. Frontiers in Genetics 2019,10:1152.
doi: 10.3389/fgene.2019.01152 |
[55] |
Stark R, Grzelak M, Hadfield J : RNA sequencing: the teenage years[J]. Nature Reviews Genetics 2019,20(11):631-656.
doi: 10.1038/s41576-019-0150-2 |
[56] |
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L : Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks[J]. Nature Protocols 2012,7(3):562-578.
doi: 10.1038/nprot.2012.016 |
[57] |
Li B, Dewey CN : RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome[J]. BMC Bioinformatics 2011,12:323.
doi: 10.1186/1471-2105-12-323 |
[58] |
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C : Salmon provides fast and bias-aware quantification of transcript expression[J]. Nature Methods 2017,14(4):417-419.
doi: 10.1038/nmeth.4197 |
[59] |
Love MI, Huber W, Anders S : Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2[J]. Genome Biology 2014,15(12):550.
doi: 10.1186/s13059-014-0550-8 |
[60] |
Shen S, Park JW, Lu ZX, Lin L, Henry MD, Wu YN, Zhou Q, Xing Y : rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data[J]. Proceedings of the National Academy of Sciences of the United States of America, 2014,111(51):E5593-5601.
doi: 10.1073/pnas.1419161111 |
[61] |
Kim D, Salzberg SL : TopHat-Fusion: an algorithm for discovery of novel fusion transcripts[J]. Genome Biology 2011,12(8):R72.
doi: 10.1186/gb-2011-12-8-r72 |
[62] |
Wu HJ, Ma YK, Chen T, Wang M, Wang XJ : PsRobot: a web-based plant small RNA meta-analysis toolbox[J]. Nucleic Acids Research 2012,40(Web Server issue):W22-28.
doi: 10.1093/nar/gks554 |
[63] |
Fu S, Wang A, Au KF : A comparative evaluation of hybrid error correction methods for error-prone long reads[J]. Genome Biology 2019,20(1):26.
doi: 10.1186/s13059-018-1605-z |
[64] |
Au KF, Underwood JG, Lee L, Wong WH : Improving PacBio long read accuracy by short read alignment[J]. PLoS ONE 2012,7(10):e46679.
doi: 10.1371/journal.pone.0046679 |
[65] |
Sharon D, Tilgner H, Grubert F, Snyder M : A single-molecule long-read survey of the human transcriptome[J]. Nature Biotechnology 2013,31(11):1009-1014.
doi: 10.1038/nbt.2705 |
[66] | Rhoads A, Au KF : PacBio Sequencing and Its Applications[J]. Genomics, Proteomics & Bioinformatics 2015,13(5):278-289. |
[67] | Volden R, Palmer T, Byrne A, Cole C, Schmitz RJ, Green RE, Vollmers C : Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA[J]. Proceedings of the National Academy of Sciences of the United States of America 2018,115(39):9726-9731. |
[68] |
Fu S, Ma Y, Yao H, Xu Z, Chen S, Song J, Au KF : IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing[J]. Bioinformatics 2018,34(13):2168-2176.
doi: 10.1093/bioinformatics/bty098 |
[69] |
Weirather JL, Afshar PT, Clark TA, Tseng E, Powers LS, Underwood JG, Zabner J, Korlach J, Wong WH, Au KF : Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing[J]. Nucleic Acids Research 2015,43(18):e116.
doi: 10.1093/nar/gkv562 |
[70] |
Deonovic B, Wang Y, Weirather J, Wang XJ, Au KF : IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing[J]. Nucleic Acids Research 2017,45(5):e32.
doi: 10.1093/nar/gkw1076 |
[71] |
Garalde DR, Snell EA, Jachimowicz D, Sipos B, Lloyd JH, Bruce M, Pantic N, Admassu T, James P, Warland A et al: Highly parallel direct RNA sequencing on an array of nanopores[J]. Nature Methods 2018,15(3):201-206.
doi: 10.1038/nmeth.4577 |
[72] |
Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, Teichmann SA : Classification of low quality cells from single-cell RNA-seq data[J]. Genome Biology 2016,17:29.
doi: 10.1186/s13059-016-0888-1 |
[73] |
Lun AT, Bach K, Marioni JC : Pooling across cells to normalize single-cell RNA sequencing data with many zero counts[J]. Genome Biology 2016,17:75.
doi: 10.1186/s13059-016-0947-7 |
[74] |
Cole MB, Risso D, Wagner A, DeTomaso D, Ngai J, Purdom E, Dudoit S, Yosef N : Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq[J]. Cell Systems 2019, 8(4):315-328. e8.
doi: 10.1016/j.cels.2019.03.010 |
[75] |
Brennecke P, Anders S, Kim JK, Kolodziejczyk AA, Zhang X, Proserpio V, Baying B, Benes V, Teichmann SA, Marioni JC et al: Accounting for technical noise in single-cell RNA-seq experiments[J]. Nature Methods 2013,10(11):1093-1095.
doi: 10.1038/NMETH.2645 |
[76] |
Satija R, Farrell JA, Gennert D, Schier AF, Regev A : Spatial reconstruction of single-cell gene expression data[J]. Nature Biotechnology 2015,33(5):495-502.
doi: 10.1038/nbt.3192 |
[77] | Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E : Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment 2011,83(3):036103. |
[78] |
Rozenblatt-Rosen O, Stubbington MJT, Regev A, Teichmann SA : The Human Cell Atlas: from vision to reality[J]. Nature 2017,550(7677):451-453.
doi: 10.1038/550451a |
[79] |
Saelens W, Cannoodt R, Todorov H, Saeys Y : A comparison of single-cell trajectory inference methods[J]. Nature Biotechnology 2019,37(5):547-554.
doi: 10.1038/s41587-019-0071-9 |
[80] |
Van den Berge K, Perraudeau F, Soneson C, Love MI, Risso D, Vert JP, Robinson MD, Dudoit S, Clement L : Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications[J]. Genome Biology 2018,19(1):24.
doi: 10.1186/s13059-018-1406-4 |
[81] |
Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH : An integrated software system for analyzing ChIP-chip and ChIP-seq data[J]. Nature Biotechnology 2008,26(11):1293-1300.
doi: 10.1038/nbt.1505 |
[82] |
Jothi R, Cuddapah S, Barski A, Cui K, Zhao K : Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data[J]. Nucleic Acids Research 2008,36(16):5221-5231.
doi: 10.1093/nar/gkn488 |
[83] |
Bardet AF, Steinmann J, Bafna S, Knoblich JA, Zeitlinger J, Stark A : Identification of transcription factor binding sites from ChIP-seq data at high resolution[J]. Bioinformatics 2013,29(21):2705-2713.
doi: 10.1093/bioinformatics/btt470 |
[84] | Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nussbaum C, Myers RM, Brown M, Li W et al: Model-based Analysis of ChIP-Seq (MACS)[J]. Genome Biology 2008,9(9). |
[85] |
Boyle AP, Guinney J, Crawford GE, Furey TS : F-Seq: a feature density estimator for high-throughput sequence tags[J]. Bioinformatics 2008,24(21):2537-2538.
doi: 10.1093/bioinformatics/btn480 |
[86] |
Zhang X, Robertson G, Krzywinski M, Ning K, Droit A, Jones S, Gottardo R : PICS: probabilistic inference for ChIP-seq[J]. Biometrics 2011,67(1):151-163.
doi: 10.1111/j.1541-0420.2010.01441.x |
[87] | Angarica VE, Del Sol A : Bioinformatics Tools for Genome-Wide Epigenetic Research[J]. Advances in Experimental Medicine and Biology 2017,978:489-512. |
[88] |
Du P, Kibbe WA, Lin SM : lumi: a pipeline for processing Illumina microarray[J]. Bioinformatics 2008,24(13):1547-1548.
doi: 10.1093/bioinformatics/btn224 |
[89] |
Barfield RT, Kilaru V, Smith AK, Conneely KN : CpGassoc: an R function for analysis of DNA methylation microarray data[J]. Bioinformatics 2012,28(9):1280-1281.
doi: 10.1093/bioinformatics/bts124 |
[90] |
Li H, Durbin R : Fast and accurate short read alignment with Burrows-Wheeler transform[J]. Bioinformatics 2009,25(14):1754-1760.
doi: 10.1093/bioinformatics/btp324 |
[91] |
Langmead B, Trapnell C, Pop M, Salzberg SL : Ultrafast and memory-efficient alignment of short DNA sequences to the human genome[J]. Genome Biology 2009,10(3):R25.
doi: 10.1186/gb-2009-10-3-r25 |
[92] |
Krueger F, Andrews SR : Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications[J]. Bioinformatics 2011,27(11):1571-1572.
doi: 10.1093/bioinformatics/btr167 |
[93] |
Liang F, Tang B, Wang Y, Wang J, Yu C, Chen X, Zhu J, Yan J, Zhao W, Li R : WBSA: web service for bisulfite sequencing data analysis[J]. PLoS ONE 2014,9(1):e86707.
doi: 10.1371/journal.pone.0086707 |
[94] |
Huang KYY, Huang YJ, Chen PY : BS-Seeker3: ultrafast pipeline for bisulfite sequencing[J]. BMC Bioinformatics 2018,19(1):111.
doi: 10.1186/s12859-018-2120-7 |
[95] |
Wu P, Gao Y, Guo WL, Zhu P : Using local alignment to enhance single-cell bisulfite sequencing data efficiency[J]. Bioinformatics 2019,35(18):3273-3278.
doi: 10.1093/bioinformatics/btz125 |
[96] |
Lea AJ, Tung J, Zhou X : A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data[J]. PLoS Genetics 2015,11(11):e1005650.
doi: 10.1371/journal.pgen.1005650 |
[97] |
Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, Mason CE : methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles[J]. Genome Biology 2012,13(10):R87.
doi: 10.1186/gb-2012-13-10-r87 |
[98] | Sun DQ, Xi YX, Rodriguez B, Park HJ, Tong P, Meong M, Goodell MA, Li W : MOABS: model based analysis of bisulfite sequencing data[J]. Genome Biology 2014,15(2). |
[99] |
Juhling F, Kretzmer H, Bernhart SH, Otto C, Stadler PF, Hoffmann S : metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data[J]. Genome Research 2016,26(2):256-262.
doi: 10.1101/gr.196394.115 |
[100] |
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al: The accessible chromatin landscape of the human genome[J]. Nature 2012,489(7414):75-82.
doi: 10.1038/nature11232 |
[101] |
Liu L, Xie J, Sun X, Luo K, Qin ZS, Liu H : An approach of identifying differential nucleosome regions in multiple samples. BMC Genomics 2017,18(1):135.
doi: 10.1186/s12864-017-3541-9 |
[102] |
Buitrago D, Codo L, Illa R, de Jorge P, Battistini F, Flores O, Bayarri G, Royo R, Del Pino M, Heath S et al: Nucleosome Dynamics: a new tool for the dynamic analysis of nucleosome positioning[J]. Nucleic Acids Research 2019,47(18):9511-9523.
doi: 10.1093/nar/gkz759 |
[103] |
Cusanovich DA, Hill AJ, Aghamirzaie D, Daza RM, Pliner HA, Berletch JB, Filippova GN, Huang X, Christiansen L, DeWitt WS , et al: A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility[J]. Cell 2018, 174(5):1309-1324. e1318.
doi: 10.1016/j.cell.2018.06.052 |
[104] |
Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA : Iterative correction of Hi-C data reveals hallmarks of chromosome organization[J]. Nature Methods 2012,9(10):999-1003.
doi: 10.1038/NMETH.2148 |
[105] |
Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, Aiden EL : Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments[J]. Cell Systems 2016,3(1):95-98.
doi: 10.1016/j.cels.2016.07.002 |
[106] |
Li A, Yin X, Xu B, Wang D, Han J, Wei Y, Deng Y, Xiong Y, Zhang Z : Decoding topologically associating domains with ultra-low resolution Hi-C data by graph structural entropy[J]. Nature Communications 2018,9(1):3265.
doi: 10.1038/s41467-018-05691-7 |
[107] | Cournac A, Marie-Nelly H, Marbouty M, Koszul R, Mozziconacci J : Normalization of a chromosomal contact map[J]. BMC Genomics 2012,13. |
[108] |
Wolff J, Bhardwaj V, Nothjunge S, Richard G, Renschler G, Gilsbach R, Manke T, Backofen R, Ramirez F, Gruning BA : Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization[J]. Nucleic Acids Research 2018,46(W1):W11-W16.
doi: 10.1093/nar/gky504 |
[109] |
Zheng XB, Zheng YX : CscoreTool: fast Hi-C compartment analysis at high resolution[J]. Bioinformatics 2018,34(9):1568-1570.
doi: 10.1093/bioinformatics/btx802 |
[110] |
Norton HK, Emerson DJ, Huang H, Kim J, Titus KR, Gu S, Bassett DS, Phillips-Cremins JE : Detecting hierarchical genome folding with network modularity[J]. Nature Methods 2018,15(2):119-122.
doi: 10.1038/nmeth.4560 |
[111] |
Chen FL, Li GP, Zhang MQ, Chen Y : HiCDB: a sensitive and robust method for detecting contact domain boundaries[J]. Nucleic Acids Research 2018,46(21):11239-11250.
doi: 10.1093/nar/gky789 |
[112] |
Schwarzer W, Abdennur N, Goloborodko A, Pekowska A, Fudenberg G, Loe-Mie Y, Fonseca NA, Huber W, Haering CH, Mirny L et al: Two independent modes of chromatin organization revealed by cohesin removal[J]. Nature 2017,551(7678):51-56.
doi: 10.1038/nature24281 |
[113] |
Xu Z, Zhang G, Wu C, Li Y, Hu M : FastHiC: a fast and accurate algorithm to detect long-range chromosomal interactions from Hi-C data[J]. Bioinformatics 2016,32(17):2692-2695.
doi: 10.1093/bioinformatics/btw240 |
[114] |
Ron G, Globerson Y, Moran D, Kaplan T : Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains[J]. Nature Communications 2017,8(1):2237.
doi: 10.1038/s41467-017-02386-3 |
[115] |
Paulsen J, Sandve GK, Gundersen S, Lien TG, Trengereid K, Hovig E : HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization[J]. Bioinformatics 2014,30(11):1620-1622.
doi: 10.1093/bioinformatics/btu082 |
[116] |
Akdemir KC, Chin L : HiCPlotter integrates genomic data with interaction matrices[J]. Genome Biology 2015,16:198.
doi: 10.1186/s13059-015-0767-1 |
[117] |
Szalaj P, Michalski PJ, Wroblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D : 3D-GNOME: an integrated web service for structural modeling of the 3D genome[J]. Nucleic Acids Research 2016,44(W1):W288-293.
doi: 10.1093/nar/gkw437 |
[118] |
Nadhir DM, Mengjie W, Q. ZM, Juntao G : HiC-3DViewer: a new tool to visualize Hi-C data in 3D space[J]. Quantitative Biology 2017,5(2):183-190.
doi: 10.1007/s40484-017-0091-8 |
[119] |
Wang Y, Song F, Zhang B, Zhang L, Xu J, Kuang D, Li D, Choudhary MNK, Li Y, Hu M et al: The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions[J]. Genome Biology 2018,19(1):151.
doi: 10.1186/s13059-018-1519-9 |
[120] |
Tang B, Li F, Li J, Zhao W, Zhang Z : Delta: a new web-based 3D genome visualization and analysis platform[J]. Bioinformatics 2018,34(8):1409-1410.
doi: 10.1093/bioinformatics/btx805 |
[121] | Calandrelli R, Wu Q, Guan J, Zhong S : GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data[J]. Genomics, Proteomics & Bioinformatics 2018,16(5):365-372. |
[122] |
Stansfield JC, Cresswell KG, Vladimirov VI, Dozmorov MG : HiCcompare: an R-package for joint normalization and comparison of HI-C datasets[J]. BMC Bioinformatics 2018,19(1):279.
doi: 10.1186/s12859-018-2288-x |
[123] |
Trieu T, Oluwadare O, Wopata J, Cheng J : GenomeFlow: a comprehensive graphical tool for modeling and analyzing 3D genome structure[J]. Bioinformatics 2019,35(8):1416-1418.
doi: 10.1093/bioinformatics/bty802 |
[124] | Lu F, Wei Z, Luo Y, Guo H, Zhang G, Xia Q, Wang Y : SilkDB 3.0: visualizing and exploring multiple levels of data for silkworm[J]. Nucleic Acids Research 2020,48(D1):D749-D755. |
[125] |
Pal K, Forcato M, Ferrari F : Hi-C analysis: from data generation to integration[J]. Biophysical Reviews 2019,11(1):67-78.
doi: 10.1007/s12551-018-0489-1 |
[126] |
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR et al: The NIH Roadmap Epigenomics Mapping Consortium[J]. Nature Biotechnology 2010,28(10):1045-1048.
doi: 10.1038/nbt1010-1045 |
[127] |
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ : Gapped BLAST and PSI-BLAST: a new generation of protein database search programs[J]. Nucleic Acids Research 1997,25(17):3389-3402.
doi: 10.1093/nar/25.17.3389 |
[128] |
Kent WJ : BLAT--the BLAST-like alignment tool[J]. Genome Research 2002,12(4):656-664.
doi: 10.1101/gr.229202 |
[129] |
Langmead B, Salzberg SL : Fast gapped-read alignment with Bowtie 2[J]. Nature Methods 2012,9(4):357-359.
doi: 10.1038/NMETH.1923 |
[130] |
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL : Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype[J]. Nature Biotechnology 2019,37(8):907-915.
doi: 10.1038/s41587-019-0201-4 |
[131] | Hill MD, Marty MR : Amdahl’s law in the multicore era[J]. Computer 2008,41(7):33-38. |
[132] |
Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, Beck S : A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data[J]. Bioinformatics 2013,29(2):189-196.
doi: 10.1093/bioinformatics/bts680 |
[133] |
Wang J, Agarwal D, Huang M, Hu G, Zhou Z, Ye C, Zhang NR : Data denoising with transfer learning in single-cell transcriptomics[J]. Nature Methods 2019,16(9):875-878.
doi: 10.1038/s41592-019-0537-1 |
[134] |
Wilson CM, Li K, Yu X, Kuan PF, Wang X : Multiple-kernel learning for genomic data mining and prediction[J]. BMC Bioinformatics 2019,20(1):426.
doi: 10.1186/s12859-019-2992-1 |
[135] |
Dinov ID, Heavner B, Tang M, Glusman G, Chard K, Darcy M, Madduri R, Pa J, Spino C, Kesselman C et al: Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations[J]. PLoS ONE 2016,11(8):e0157077.
doi: 10.1371/journal.pone.0157077 |
[136] |
Zheng J, Wang K : Emerging deep learning methods for single-cell RNA-seq data analysis[J]. Quantitative Biology 2019,7(4):247-254.
doi: 10.1007/s40484-019-0189-2 |
[137] |
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, Lipson KS, Knight R, Caporaso JG, Segata N et al: Species-level functional profiling of metagenomes and metatranscriptomes[J]. Nature Methods 2018,15(11):962-968.
doi: 10.1038/s41592-018-0176-y |
[138] | Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J : BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics 2020,36(4):1234-1240. |
[139] | National Genomics Data Center Members and Partners: Database Resources of the National Genomics Data Center in 2020[J]. Nucleic Acids Research 2020,48(D1):D24-D33. |
[1] | 陈文杰,胡正银,胡靖,庞弘燊,何雨娟. 多维数据驱动的粮食安全分析与智能决策系统研究与实践[J]. 数据与计算发展前沿, 2021, 3(6): 1-14. |
[2] | 张舒莹,韩鑫胤,何小雨,袁丹阳,栾海晶,李瑞琳,何佳茵,牛北方. 基于机器学习的基因组微卫星状态探测方法综述[J]. 数据与计算发展前沿, 2021, 3(3): 126-135. |
[3] | 曾瀞瑶,苑娜,魏文娟,李根,杜政霖. 高通量计算在大规模人群队列基因组数据解析应用中的挑战[J]. 数据与计算发展前沿, 2020, 2(1): 117-127. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||