数据与计算发展前沿 ›› 2019, Vol. 1 ›› Issue (2): 26-36.

doi: 10.11871/jfdc.issn.2096-742X.2019.02.003

所属专题: “人工智能”专刊

• 人工智能专刊 • 上一篇    下一篇

语音识别技术研究进展与挑战

刘庆峰,高建清*(),万根顺   

  1. 科大讯飞股份有限公司,安徽 合肥 230088
  • 收稿日期:2019-09-17 出版日期:2019-12-20 发布日期:2020-01-15
  • 通讯作者: 高建清
  • 作者简介:刘庆峰,1973年生,科大讯飞股份有限公司董事长,中国科学技术大学信号与信息处理专业博士学位,语音及语言信息处理国家工程实验室主任,中国科学技术大学兼职教授、博导,十届、十一届、十二届、十三届全国人大代表,全国大学生创新创业联盟首任理事长,中国语音产业联盟理事长。研究方向为信号处理,语音及语言信息处理。
    本文承担工作为:框架的整体结构设计、研究指导。
    Liu Qingfeng was born in 1973. He received the Ph.D. degree of signal and information processing from the University of Science and Technology of China (USTC). He is the CEO of IFLYTEK, as well as the director of National Engineering Laboratory for Speech and Language Information Processing, and the adjunct professor and PhD supervisor of USTC. He was selected as National People’s Congress deputy four times since 10th NPC. He serves as the first chairman of national union for college students’ innovation and entrepreneurship, and the chairman of Speech Industry Alliance of China. His research interests include signal processing as well as speech and language information processing.
    Liu Qingfeng contributed to the organization of the paper and supervised the research.
    E-mail: qfliu@iflytek.com|高建清,1983年生,中国科学技术大学电子与信息专业工程博士学位,科大讯飞AI研究院副院长。研究方向为语音识别、语音及语音信息处理、对话系统。
    本文承担工作为:本文第1节,第2.1节的主要贡献者,全文的修改。
    Gao Jianqing was born in 1983 and received D.Eng. degree in electronics and information from the University of Science and Technology of China (USTC). He is the vice dean of IFLYTEK AI Research. His research interests include automatic speech recognition, speech and language information processing and spoken dialogue system.
    Gao Jianqing contributed to the chapter 1, 2.1 and revised the entire paper.|万根顺,1989年生,江苏大学通信与信息系统专业硕士学位,科大讯飞AI研究院研究主管。研究方向为语音识别、语音及语音信息处理。
    本文承担工作为:本文第2.2、2.3、2.4节的主要贡献者。
    Wan Genshun was born in 1989 and received B.Eng. degree in communication and information system from Jiangsu University. He is the director of research of IFLYTEK AI Research. His research interests include automatic speech recognition as well as speech and language information processing.
    Wan Genshun contributed to the chapter 2.2, 2.3 and 2.4.
    E-mail:gswan@iflytek.com

The Research Development and Challenge of Automatic Speech Recognition

Liu Qingfeng,Gao Jianqing*(),Wan Genshun   

  1. IFLYTEK, Hefei, Anhui 230088, China
  • Received:2019-09-17 Online:2019-12-20 Published:2020-01-15
  • Contact: Gao Jianqing

摘要:

【目的】本文对语音识别系统的主流技术框架及主要挑战进行了系统而全面的介绍,为语音识别领域的进一步技术研究提供参考。【方法】首先,介绍了端到端语音识别框架的主流方案;然后,提出了语音识别应用中的四大挑战性问题,即恶劣场景的识别问题、中英文混合识别问题、专业术语的识别问题以及低资源小语种识别问题。【结果】针对端到端框架稳定性不足的问题,提出了带有强化和过滤注意力机制的改进方案。针对语音识别中的挑战性难题,探讨了主流的解决方案及未来的发展方向。【结论】端到端框架的大规模商用仍存在较大挑战,四大挑战性问题的解决将对语音识别的行业应用推广起到关键的作用。

关键词: 语音识别, 端到端, 远场识别, 中英文混合, 专业术语

Abstract:

[Objective] This paper firstly introduces the start-of-art technical framework and main challenges of Automatic Speech Recognition (ASR) systems, then provides reference for further research in the field of ASR. [Methods] Firstly, the newest framework of end-to-end speech recognition is introduced, including the Connectionist Temporal Classification(CTC) and attention based framework. Secondly, four challenging problems in ASR applications are presented, including the recognition of noisy and distant field speech, the recognition of code-switching, the recognition of domain related terms, and minority language speech recognition with limited resources. [Results] For the problem of robustness of end-to-end ASR system, an improved enhancement method and filtering attention mechanism is proposed. The start-of-art methods and future development directions are discussed regarding to the challenging problems of ASR systems. [Conclusions] There is a major challenge for the commercialization of the end-to-end ASR systems, and the research on four challenging problems plays a key role in the application of ASR systems.

Key words: automatic speech recognition, end-to-end, distant filed speech, code-switch, domain related terms