下载APP

【简答题】

Speech Recognition 语音识别系统 Automatic recognition of speech by machine [1] has been a goal of research for more than four decades and has inspired such science fiction wonders as the computer HAL in Stanley Kubrick's famous movie 2001—A Space Odyssey [2] and the robot R2D2 in the George Lucas classic Star Wars [3] series of movies. However, in spite of the glamour of designing an intelt machine that can recognize the spoken word and comprehend its meaning, and in spite of the enormous research efforts spent in trying to create such a machine, we are far from [4] achieving the desired goal of a machine that can understand spoken discourse on any subject by all speakers in all environments. Thus, an important question is, What do we mean by "speech recognition by machine". Another important question is, How can we build a series of bridges that will enable us to advance both our knowledge as well as the capabilities of modern speech-recognition systems so that the "holy grail" [5] of conversational speech recognition and understanding by machine is attained? Because we do not know how to solve the ultimate challenge of speech recognition, our goal here is to give a series of presentations on the fundamental principles of most modern, successful speech-recognition systems so as to provide a framework from which researchers can expand the frontier. We will attempt to avoid absolute judgments on the relative merits of various approaches to particular speech-recognition problems. Instead we will provide the theoretical background and justification for each topic discussed so that the reader is able to understand why the techniques have proved valuable and how they can be used to benefit practical situations. One of the most difficult aspects of performing research in speech recognition by machine is its interdisciplinary nature [6] , and the tendency of most researchers to apply a monolithic approach to individual problems. Consider the disciplines that have been applied to one or more speech-recognition problems. 1. signal processing—the process of extracting relevant information from the speech signal in an efficient, robust manner. Included in signal processing is the form of spectral ysis used to characterize the time-varying properties of the speech signal as well as various types of signal preprocessing (and postprocessing) to make the speech signal robust to the recording environment (signal enhancement). 2. physics (acoustics)—the science of understanding the relationship between the physical speech signal and the physiological mechanisms (the human vocal tract mechanism) [7] that produced the speech and with which the speech is perceived (the human hearing mechanism). 3. pattern recognition—the set of algorithms used to cluster data to create one or more prototypical patterns of a data ensemble, and to match a pair of patterns on the basis of feature measurements of the patterns. 4. communication and information theory—the procedures for estimating parameters of statistical models; the methods for detecting the presence of particular speech patterns, the set of modern coding and decoding algorithms used to search a large but finite grid for a best path corresponding to a "best" recognized sequence of words. 5. linguistics—the relationships between sounds, words in a language, meaning of spoken words and sense derived from meaning. Included within this discipline are the methodology of grammar and language parsing. 6. physiology—understanding of the higher-order mechanisms within the human central nervous system that account for speech production and perception in human beings. Many modern techniques try to embed this type of knowledge within the framework of artificial neural networks (which depend heavily on several of the above disciplines). 7. computer science—the study of efficient algorithms for implementing, in software or hardware, the various methods used in a practical speech-recognition system. 8. psychology—the science of understanding the factors that enable technology to be used by human beings in practical tasks. Successful speech-recognition systems require knowledge and expertise from a wide range of disciplines, a range far larger than any single person can possess [8] . Therefore, it is especially important for a researcher to have a good understanding of the fundamentals of speech recognition (so that a range of techniques can be applied to a variety of problems), without necessarily having to be an expert in each aspect of the problem. The purpose is to provide this expertise by giving in-depth discussions of a number of fundamental topics in speech-recognition research. A general model for speech recognition begins with a user creating a speech signal (speaking) to accomplish a given task. The spoken output is first recognized in the speech signal that is decoded into a series of words that are meaningful according to the syntax, semantics and pragmatics [9] of the recognition task. A higher-level processor that uses a dynamic knowledge representation to modify the syntax, semantics, and pragmatics according to the context of what it has previously recognized obtains the meaning of the recognized words. In this manner, things such as non-sequitors are omitted from consideration at the risk of misunderstanding, but at the gain of minimizing errors for sequentially meaningful inputs. The feedback from the higher-level processing box reduces the complexity of the recognition model by limiting the search for valid input sentences (speech) from the user. The recognition system responds to the user in the form of a voice output, or equivalently, in the form of the requested action being performed, with the user being prompted for more input. A Brief History of Speech-Recognition Research Research in automatic speech recognition by machine has been done for almost four decades. To gain an appreciation for the amount of progress achieved over this period, it is worthwhile to briefly review some research highlights [10] . The reader is cautioned that such a review is cursory, at best, and must therefore suffer from errors of judgment as well as omission. The earliest attempts to devise systems for automatic speech recognition by machine were made in 1950s, at Bell Laboratories. Davis, Biddulph, and Balashek built a system for isolated digit recognition for a single speaker. The system relied heavily on measuring spectral resonances during the vowel region of each digit. Another effort of note in this period was the vowel recognizer of Forgie and Forgie, constructed at MIT Lincoln Laboratories [11] in 1959, in which 10 vowels embedded in a /b/-vowel-/t/ format were recognized in a speaker-independent manner. Again, a filter bank yzer was used to provide special information and a time-varying estimate of the vocal tract resonances was made to decide which vowel was spoken. In the 1960s several fundamental ideas in speech recognition suced and were published. However, the decade started with several Japanese laboratories entering the recognition arena and building special-purpose hardware as part of their systems. In the 1960s three key research projects were initiated that have had major implications on the research and development of speech recognition for the past 20 years. The first of these projects was from the effort of Martin and his colleagues at RCA [12] Laboratories, beginning in the late 1960s, to develop realistic solutions to the problems associated with nonuniformity of time scales in speech s. At about the same time, in the Soviet Union, Vintsyuk proposed the use of dynamic programming methods for time aligning a pair of speech utterances. Although the essence of the concepts of dynamic time warping, as well as rudimentary versions of the algorithms for connected word recognition, were embodied in Vintsyuk's work, it was largely unknown in the West and did not come to light until the early 1980s; this was long after the more formal methods were proposed and implemented by others. A final achievement of note in the 1960s was the pioneering research of Reddy in the field of continuous speech recognition by dynamic tracking of phonemes. Reddy's research ually spawned a long and highly successful speech-recognition research program at Carnegie Mellon University, which, to this day, remains a world leader in continuous- speech-recognition systems. In the 1970s speech-recognition research achieved a number of significant milestones. First was the area of isolated word or discrete utterance recognition. The Japanese research showed how dynamic programming methods could be successfully applied; and the American research showed how the ideas of linear predictive coding (LPC) [13] , which had already been successfully used in low-bit-rate speech coding, could be extended to speech- recognition systems through the use of an appropriate distance measure based on LPC spectral parameters. Another milestone of the 1970s was the beginning of a longstanding, highly successful group effort in large vocabulary speech recognition at IBM, in which researchers studied three distinct database queries, the laser patent text language for transcribing laser patents, and the office correspondence task, called Tangora, for dictation of memos. Speech research in the 1980s was characterized by a shift in technology from template- based approaches to statistical modeling methods—especially the hidden Markov model approach. Although the methodology of hidden Markov modeling (HMM) [14] was well known and understood in a few laboratories, it was not until widespread publication of the methods and theory of HMMs, in the mid-1980s, that the technique became widely applied in virtually every speech-technology that was recognition research laboratory in the world. Another "new" technology that was reintroduced in the late 1980s was the idea of applying neural networks to pr

举报

题目标签：系统识别语音识别

参考答案：

参考解析：

刷刷题刷刷变学霸

举一反三

【多选题】哪些非自然人客户需要识别受益所有人

A.

公司、合伙企业、信托产品、基金产品

B.

个体工商户、个人独资企业、不具备法人资格的专业服务机构

C.

受政府控制的企事业单位等需要开展受益所有人识别工作

D.

经营农林渔牧产业的非公司制农民专业合作组织

查看完整题目与答案

【判断题】农信银综合业务系统运行工作日是法定工作日，运行工作时间由农信银中心统一规定，省中心必须与其保持一致。

A.

正确

B.

错误

查看完整题目与答案

【多选题】关于使用总账系统制单，下列描述中正确的有（）。

A.

可查看任意科目的最新余额

B.

可控制操作员使用科目的权限

C.

凭证可按任意格式的文本文件引入和引出

D.

不能修改和删除其他子系统生成的凭证

查看完整题目与答案

【简答题】[名词解释] 识别

查看完整题目与答案

【单选题】运维记录、台账原则上应通过（）系统进行记录，系统中无法记录的内容可通过纸质或其他记录形式予以补充。

A.

PMS

B.

OMS

C.

ERP

D.

WMS

查看完整题目与答案

【简答题】系统安全的主要观点是什么?

查看完整题目与答案

【单选题】若系统在运行过程中由于某种硬件故障，致使存储在外存上的部分损失或全部损失，这种情况称为（）

A.

事务故障

B.

系统故障

C.

介质故障

D.

运行故障

查看完整题目与答案

【多选题】以下哪些要素属于企业行为识别中的对内活动（）

A.

内部培训

B.

发展目标

C.

开发研究

D.

市场调研

查看完整题目与答案

【多选题】电晕对系统的影响有()。

A.

损耗电功率

B.

改变线路参数

C.

对通讯有相当大的影响

D.

没有影响

查看完整题目与答案

【判断题】509/651华为云提供的一句话识别服务,如果服务调用成功,则通过end_time表示起始时间。

A.

正确

B.

错误

查看完整题目与答案