Show The Graduate Center Menu

Speech and Audio Understanding


Professor Michael Mandel


In October, 2014, the number of mobile devices surpassed the number of humans in the world. In addition to data communications, almost all of these devices are equipped with at least one microphone and speaker. This course gives a thorough foundation in analyzing, understanding, and manipulating audio signals recorded by these devices. It will cover traditional algorithms that provide human-human and human-machine communication along with promising solutions to some of their limitations. It will also cover recent applications to musical audio and environmental signals.

Course description

“Machine listening” is a multidisciplinary field at the intersection of signal processing, machine learning, and psychoacoustics. This course will begin by introducing necessary material from those fields to provide a foundation for the rest of the course. Machine listening is primarily concerned with analyzing and understanding three types of signals: speech, music, and environmental sounds and these will be the focus of the course. We will also consider additional applications that require the creation or manipulation of these sounds in speech and music.

List of topics

The topics may include but are not limited to:

  • Fundamentals of

    1. Digital signal processing

    2. Acoustics

    3. Auditory perception

    4. Machine learning

  • Core machine listening topics

    1. Speech models and speech synthesis

    2. Speech recognition features and acoustic modeling, including noise robustness issues

    3. Speech recognition language modeling, search, and weighted finite state transducers

    4. Music analysis and modeling

    5. Source separation and spatial sound

    6. Environmental sound analysis

Learning objectives

Students will be able to demonstrate working knowledge of the theory, algorithms, and software involved in

  • Speech recognition

  • Speech synthesis

  • Music information retrieval

  • Environmental audio analysis


Assessment will be based on two main components: weekly paper discussions and a final project.
Each week, the class will read an important paper in the field relevant to the current topic. Everyone will write one paragraph summarizing the paper and one paragraph response to the paper. And one or two students will lead the class discussion on that paper.
In order to encourage final projects that are novel, well thought out, and well executed, activities related to them will be spread through the semester. Weekly writing assignments will also include a paragraph discussing progress on the project. There will be a midterm project proposal presentation, where each student will present their current plan and receive feedback on it. There will be final project presentations at the end of the semester, with additional feedback, and final paper submissions slightly after that should incorporate this feedback. The above components will be weighted as follows:

  • Weekly writing: 20%

  • Paper presentation: 10%

  • Participation/attendance: 10%

  • Project proposal presentation: 20%

  • Final project presentation: 10%

  • Final project paper: 30%