CUNY NLP: Ryan McDonald (Google)
FEB 15, 2013 | 3:00 PM TO 4:30 PM
The Graduate Center
365 Fifth Avenue
February 15, 2013: 3:00 PM-4:30 PM
Advances in Cross-Lingual Syntactic Transfer
The idea to use annotated resources from one language to learn models for another has been around for at least a decade. Typically these models have relied on access to parallel data. However, recent approaches have focused on "direct" cross-lingual transfer, and in particular, delexicalized transfer. Delexicalized parsing models are conditioned only on properties of the input that are available across languages, typically induced tags or clusters. Since these properties are universally available, it is possible to directly use a parser trained on English for every other language. This simple method has shown itself to be surprisingly effective and outperforms the best weakly-supervised models by a significant margin. However, the assumptions underlying these models are far to weak to obtain parsing accuracies at the level of monolingual supervised methods. In this talk I will focus on porting ideas from work on selective parameter sharing in multi-source direct transfer to highly accurate latent CRF parsing models. I will then present novel semi-supervised learning algorithms that relexicalize these models on unlabeled target language data to give significant improvements. The final model brings us one step closer to building robust syntactic parsers for all the world's languages.
Joint work with: Oscar Tackstrom, Slav Petrov, Keith Hall, Joakim Nivre.
Ryan McDonald is a Research Scientist at Google. He received a Ph.D. from the University of Pennsylvania and a Hon. B.Sc. from the University of Toronto. Ryan's thesis focused on the problem of syntactic dependency parsing. His work allowed complex linguistic constructions to be modeled in a direct and tractable way, which enabled parsers that are both efficient and accurate. In 2008 he wrote a book on the subject entitled Dependency Parsing. Since joining Google, Ryan has continued to work on syntactic analysis, in particular, extending statistical models learned on resource rich languages, like English, to resource poor languages. Ryan's research also addresses how these systems can be used to improve the quality of a number of important user-facing technologies, such as search, machine translation, and sentiment analysis.