CIRCL: Hussein Ghaly (Computation Linguistics)

OCT 10, 2017 | 6:30 PM TO 8:00 PM



The Graduate Center
365 Fifth Avenue




October 10, 2017: 6:30 PM-8:00 PM




Hussein Ghaly presents on a topic in Computational Linguistics. Title: Debugging Chomsky's hierarchy - Adequacy of Context Free Grammar to represent language.


Chomsky’s hierarchy has been one of the most well-known formalisms for defining the limits of formal grammar. The hierarchy includes multiple types of grammar (regular, context-free, context-sensitive and recursively enumerable). Chomsky has argued that the first two types are inadequate to represent natural language. This entailed the direction towards context sensitivity of the language, and hence transformational grammar (Chomsky, 1956 and 1957). This idea of context-sensitivity with the accompanying processing complexity (NP-hard/Turing-Equivalent (Pullum, 2009)) might have led Computational Linguistics to depart from the models based on formal grammar in favor of more data-driven models. Context sensitivity of the language is also associated with the language being unlearnable from input sentences alone without an informant, according to Gold’s definition of learnability (Gold, 1967).

This talk attempts to develop a simple argument, building on Gazdar (1982) and Pullum and Gazdar (1982) that context-free grammar (CFG) is actually an adequate model for representing natural language, while addressing the inadequacies cited by Chomsky in his early works (“Three models for the description of language” (1956), “Syntactic Structures”, 1957, and “The Algebraic Theory of Context-Free Languages”, 1963), and also by Schieber in Swiss German (Schieber, 1985). The main hypothesis is that with additional labels and rewrite rules, it will be possible to parse or generate any English sentence using CFG. The resulting finite set of labels and rules will be able to satisfy Chomsky’s criteria of generating only grammatical sentences and not ungrammatical ones. As a demonstration, a new CFG parser has been developed to accept only grammatical sentences based on the CFG inadequacy claims above, which also departs from common modern parsers that can accept and parse ungrammatical input. Going forward, parsing using context-free grammar can also help in other Natural Language Processing tasks, such as Named Entity Recognition, Coreference Resolution, and others, without overly relying on data. It can also help formalize a learnability model that would take sentences as input and develop labels and rules as part of forming the grammar. 

All are welcome!