Восстановить пароль
FAQ по входу

Gold B., Morgan N., Ellis D. Speech and Audio Signal Processing. Processing and Perception of Speech and Music

  • Файл формата pdf
  • размером 10,10 МБ
  • Добавлен пользователем
  • Отредактирован
Gold B., Morgan N., Ellis D. Speech and Audio Signal Processing. Processing and Perception of Speech and Music
John Wiley, 2011. — 679 p.
Technology moves at a dizzying pace; however, progress can actually seem quite slow in any area that we are deeply involved in. Conference proceedings are filled with incremental advances over previous methods, and entirely novel (and successful) approaches to speech and audio processing are rare. But a lot can happen in a decade, and it has. In addition to quite new methods, there are also many ideas that had not really been refined enough to show progress in the 1990s, but which now are in common use. For instance, Maximum Mutual Information methods, which were developed for ASR many years ago and were briefly described in the previous edition of this book, was significantly refined in the last decade, and the newer versions of this approach are now widely used. Consequently, we devoted new sections of this revision to MMI (and related methods like MPE).
These advances might have been sufficient to warrant an update of our textbook, but there were other reasons as well. A decade of teaching with the book has revealed a number of bugs and deficiencies, and a new edition affords us the opportunity to correct them. For instance, the previous version had nothing about sound source separation, an area that has received considerable attention in the last decade. Approaches to the coding, transcription, and retrieval of music are also now significant areas of audio signal processing, and were not originally covered in the book.
Last, and not least, the new edition has the benefit of a fresh look at the overall subject from our new co-author, Professor Dan Ellis from Columbia University. This hand-off is a key step in keeping the text current.
As with the previous edition, we've attempted to keep the overall style consistent, focusing on what we think is essential, and leaving many details for other publications. We hope that this choice has helped to make the text useful for many readers.
Speech and music are the most basic means of adult human communication. As technology advances and increasingly sophisticated tools become available to use with speech and music signals, scientists can study these sounds more effectively and invent new ways of applying them for the benefit of humankind. Such research has led to the development of speech and music synthesizers, speech transmission systems, and automatic speech recognition (ASR) systems. Hand in hand with this progress has come an enhanced understanding of how people produce and perceive speech and music. In fact, the processing of speech and music by devices and the perception of these sounds by humans are areas that inherently interact with and enhance each other.
Despite significant progress in this field, there is still much that is not well understood. Speech and music technology could be greatly improved. For instance, in the presence of unexpected acoustic variability, ASR systems often perform much worse than human listeners (still!). Speech that is synthesized from arbitrary text still sounds artificial. Speech-coding techniques remain far from optimal, and the goal of transparent transmission of speech and music with minimal bandwidth is still distant. All fields associated with the processing and perception of speech and music stand to benefit greatly from continued research efforts. Finally, the growing availability of computer applications incorporating audio (particularly over the Internet and in portable devices) has increased the need for an ever-wider group of engineers and computer scientists to understand audio signal processing. For all of these reasons, as well as our own need to standardize a text for our graduate course at UC Berkeley, we wrote this book; and for the reasons noted in the Preface, we have updated it for the current edition.
The notes on which this book is based proved beneficial to graduate students for close to a decade; during this time, of course, the material evolved, including a problem set for each chapter. The material includes coverage of the physiology and psychoacoustics of hearing as well as the results from research on pitch and speech perception, vocoding methods, and information on many aspects of ASR. To this end, the authors have made use of their own research in these fields, as well as the methods and results of many other contributors. And as noted in the Preface, this edition includes contributions from new authors as well, in order to broaden the coverage and bring it up to date.
In many chapters, the material is written in a historical framework. In some cases, this is done for motivation's sake; the material is part of the historical record, and we hope that the reader will be interested. In other cases, the historical methods provide a convenient introduction to a topic, since they often are simpler versions of more current approaches. Overall, we have tried to take a long-term perspective on technology developments, which in our view requires incorporating a historical context. The fact that otherwise excellent books on this topic have typically
Historical Background
Synthetic Audio: A Brief History
Speech Analysis and Synthesis Overview
Brief History of Automatic Speech Recognition
Speech-Recognition Overview
Mathematical Background
Digital Signal Processing
Digital Filters and Discrete Fourier Transform
Pattern Classification
Statistical Pattern Classification
Wave Basics
Acoustic Tube Modeling of Speech Production
Musical Instrument Acoustics
Room Acoustics
Auditory Perception
Ear Physiology
Models of Pitch Perception
Speech Perception
Human Speech Recognition
Speech Features
The Auditory System as a Filter Bank
The Cepstrum as a Spectral Analyzer
Linear Prediction
Automatic Speech Recognition
Feature Extraction for ASR
Linguistic Categories for Speech Recognition
Deterministic Sequence Recognition for ASR
Statistical Sequence Recognition
Statistical Model Training
Discriminant Acoustic Probability Estimation
Acoustic Model Training: Further Topics
Speech Recognition and Understanding
Synthesis and Coding
Speech Synthesis
Pitch Detection
Low-Rate Vocoders
Medium-Rate and High-Rate Vocoders
Perceptual Audio Coding
Other Applications
Some Aspects of Computer Music Synthesis
Music Signal Analysis
Music Retrieval
Source Separation
Speech Transformations
Speaker Verification
Speaker Diarization
  • Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
  • Регистрация