[1] |
Hinton G E, Osindero S, Teh Y W . A fast learning algorithm for deep belief nets[J]. Neural computation, 2006,18(7):1527-1554.
|
[2] |
Arel I, Rose D C, Karnowski T P . Deep machine learning-A new frontier in artificial intelligence research [Research Frontier][J]. Computational Intelligence Magazine, IEEE, 2010,5(4):13-18.
|
[3] |
DAVIS K. H, BIDDULPH R, BALASHEK S . Automatic recognition of spoken digits[J]. Journal of the Acoustical Society of America, 1952,24(6):637.
|
[4] |
Vintsyuk TK . Speech Discrimination by Dynamic Programming. Cybernetics and Systems Analysis, 1968,4(1):81-88.
|
[5] |
Ferguson J D . Application of hidden Markov models to text and speech[EB]. 1980.
|
[6] |
RABINER L R . A tutorial on hidden Markov models and selected applications in speech recognition[J]. Readings in Speech Recognition, 1990,77(2):267-296.
|
[7] |
Mohamed G. E. Dahl, and G. E. Hinton . Deep belief networks for phone recognition. in NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009.
|
[8] |
Sainath T N, Kingsbury B, Ramabhadran B , et al. Making deep belief networks effective for large vocabulary continuous speech recognition. Auto-matic Speech Recognition and Understanding (ASRU), 2011: 30-35.
|
[9] |
Mohamed A, Dahl G E, Hinton G . Acoustic modeling using deep belief networks[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012,20(1):14-22.
|
[10] |
Dahl G E, Yu D, Deng L , et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012,20(1):30-42.
|
[11] |
Hinton G, Deng L, Yu D , et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups[J]. Signal Processing Magazine, IEEE, 2012,29(6):82-97.
|
[12] |
HOCHREITER S, SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
|
[13] |
ZHANG Y, CHEN G G, YU D , et al. Highway long short-term memory RNNS for distant speech recognition[C]2016 IEEE International Conference on Acoustics, Speech and Signal Processing, March 20-25,Shanghai, China. Piscataway: IEEE Press, 2016.
|
[14] |
LECUN Y, BENGIO Y. Convolutional networks for images, speech and time-series[M]. Cambridge: MIT Press, 1995.
|
[15] |
ABDEL-HAMID O, MOHAMED A R, JIANG H , et al. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing, March 20, 2012, Kyoto, Japan. Piscataway: IEEE Press, 2012: 4277-4280.
|
[16] |
ABDEL-HAMID O, MOHAMED A R, JIANG H , et al. Convolutional neural networks for speech recognition[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014,22(10):1533-1545.
|
[17] |
ABDEL-HAMID O, DENG L, YU D . Exploring convolutional neural network structures and optimization techniques for speech recognition[J]. 25-29 August, Interspeech, 2013,58(4):1173-5.
|
[18] |
SAINATH T N, MOHAMED A R, KINGSBURY B , et al. Deep convolutional neural networks for LVCSR[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 26-30,2013, Vancouver, BC, Canada. Piscataway: IEEE Press, 2013: 8614-8618.
|
[19] |
SAINATH T N, VINYALS O, SENIOR A , et al. Convolutional, long short-term memory, fully connected deep neural networks[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing, April 19-24,Brisbane, QLD, Australia. Piscataway: IEEE Press, 2015: 4580-4584.
|
[20] |
JELINEK F . The development of an experimental discrete dictation recognizer[J]. Readings in Speech Recognition, 1990,73(11):1616-1624.
|
[21] |
BENGIO Y, DUCHARME R, VINCENT P . A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003(3):1137-1155.
|
[22] |
SCHWENK H, GAUVAIN J L. Training neural network language models on very large corpora[C]//Conference on Human Language Technology & Empirical Methods in Natural Language Processing, October 6-8, 2005, Vancouver, British Columbia, Canada. New York: ACM Press, 2005: 201-208.
|
[23] |
ARıSOY E, SAINATH T N, KINGSBURY B , et al. Deep neural network language models[C]//NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, June 8, 2012, Montreal, Canada. New York: ACM Press, 2012: 20-28.
|
[24] |
MIKOLOV T, KARAFIAT M, BURGET L, et al. Recurrent neural network based language model [C]// Interspeech, Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Chiba, Japan. [S.l.:s.n.], 2010: 1045-1048.
|
[25] |
G. Pundak, and T. N. Sainath. Lower Frame Rate Neural Network Acoustic Models, Interspeech, 2016.
|
[26] |
W. Chan, N. Jaitly, Q. V. Le, O. Vinyals , Listen, attend and spell, CoRR, vol. abs/1508. 01211, 2015.
|
[27] |
R. Prabhavalkar, K. Rao, T. N. Sainath, B. Li, L. Johnson, N. Jaitly , A Comparison of Sequence-to-sequence Models for Speech Recognition, Interspeech, 2017.
|
[28] |
Hannun A . Sequence Modeling with CTC[J]. 2017.
|
[29] |
Chiu C C, Sainath T N, Wu Y , et al. State-of-the-art Speech Recognition With Sequence-to-Sequence Models[J]. 2017.
|
[30] |
Models G W L . COLD FUSION: TRAINING SEQ2SEQ MODELS TO[J]. 2017.
|
[31] |
Gulcehre C, Firat O, Xu K , et al. On Using Monolingual Corpora in Neural Machine Translation[J]. Computer Science, 2015.
|
[32] |
Renduchintala A, Ding S, Wiesner M, et al. Multi-Modal Data Augmentation for End-to-end ASR [C]// Interspeech 2018.
|
[33] |
JON B, SHINJI W, EMMANUEL V, et al., 2018. The fifth ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines [C]//INTERSPEECH. 1561-1565.
|
[34] |
DUJ, GAOT, 2018. The USTC-iFlytek systems for CHiME-5 challenge [C] //The 5th International Workshopon Speech Processing in Everyday Environments.
|
[35] |
高天 . 复杂环境下基于深度学习的语音信号预处理方法研究[D]. 中国科学技术大学, 2018.
|
[36] |
Guo J, Lu S, Cai H , et al. Long Text Generation via Adversarial Training with Leaked Information[J]. 2017.
|
[37] |
Pundak G, Sainath T N, Prabhavalkar R , et al. Deep context: end-to-end contextual speech recognition[J]. 2018.
|
[38] |
Changhao Shan, Chao Weng , et al. Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. ICASSP2019:5631-5635.
|
[39] |
Li B, Zhang Y, Sainath T , et al. Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synjournal with Bytes[J]. 2018.
|