Speaker: Karen Livescu (TTI-Chicago)
Title: Multi-view learning of feature representations for speech (and language!)
This talk presents an approach to learning improved features for prediction tasks. The main focus will be on acoustic features for speech recognition, but the talk will also include brief forays into learning features for other types of data (text, images) and for other tasks. It is often possible to improve performance of classifiers or other predictors by starting with a high-dimensional input feature vector and applying linear or non-linear dimensionality reduction. The learned transformation may be unsupervised (e.g., principal components analysis, manifold learning) or supervised (e.g., linear discriminant analysis, neural network-based representations).
The talk describes a recent approach that is unsupervised, but using a second "view" of the data as additional information for learning a useful transformation. The different views may be audio, images, text, and others. The approach we take, using canonical correlation analysis (CCA) and its nonlinear extensions, finds representations of the two views that are maximally correlated. This approach avoids some of the disadvantages of other unsupervised methods, such as PCA, which are sensitive to noise and data scaling, and possibly of supervised methods, which are more task-specific. While most of the focus is on the unsupervised setting, the approach can be extended also to supervised settings where an additional data view is also available.
The talk will cover recent work using CCA, its nonlinear extension via kernel CCA, and a newly proposed, parametric nonlinear extension using deep neural networks dubbed deep CCA. Results to date show good improvements on speech recognition, as well as promising initial results on other tasks.
Karen Livescu is an Assistant Professor at TTI-Chicago. She completed her PhD at MIT in the CSAIL Spoken Language Systems group, and was a post-doctoral lecturer in the MIT EECS department. Karen's main research interests are in speech and language processing, with a slant toward combining machine learning with knowledge about linguistics and speech science. Her recent work has included multi-viewlearning of speech representations, articulatory models of pronunciation variation, discriminative training with low resources for spoken term detection and pronunciation modeling, and automatic sign language recognition. She is a member of the IEEE Spoken Language Technical Committee, associate editor for IEEE Transactions on Audio, Speech, and Language Processing, and subject editor for Speech Communication. She is an organizer/co-organizer of a number of recent workshops, including the ISCA SIGML workshops on Machine Learning in Speech and Language Processing, the Midwest Speech and Language Days, and the Interspeech Workshop on Speech Production in Automatic Speech Recognition.