Alumni Dissertations and Theses

 
 

Alumni Dissertations and Theses

Filter Dissertations and Theses By:

 
 
  • THE INTERPLAY OF SYNTACTIC PARSING STRATEGIES AND PROSODIC PHRASE LENGTHS IN PROCESSING TURKISH SENTENCES

    Author:
    Nazik Dinctopal-Deniz
    Year of Dissertation:
    2014
    Program:
    Linguistics
    Advisor:
    Janet Fodor
    Abstract:

    Many experiments have shown that the prosody (rhythm and melody) with which a sentence is uttered can provide a listener with cues to its syntactic structure (Lehiste, 1973, and since). A few studies have observed in addition that an inappropriate prosodic contour can mislead the syntactic parsing routines, resulting in a prosody-induced garden-path. These include, among others, Speer et al. (1996) and Kjelgaard and Speer (1999) for English. The studies by Speer et al. and Kjelgaard and Speer (SKS) showed that misplaced prosodic cues caused more processing difficulty in sentences with early closure of a clause (EC syntax) than in ones with late closure of a clause (LC syntax). One possible explanation for these results is that when prosody is misleading about the syntactic structure, the parser may ignore it and resort to a syntactic Late Closure strategy, as it does in reading where there is no overt prosodic boundary to inform the parser about the syntactic structure of the sentence. Augurzky's (2006) observation of an LC syntax advantage for prosody-syntax mismatch conditions in her investigation of German relative clause attachment ambiguities provides support for this explanation. An alternative explanation considers the possibility that constituent lengths could have influenced the perceived informativeness of overt prosodic cues in these studies, as proposed in the Rational Speaker Hypothesis of Clifton et al. (2002, 2006). The Rational Speaker Hypothesis (RSH) maintains that prosodic breaks flanking shorter constituents are taken more seriously as indicators of syntactic structure than prosodic breaks flanking longer constituents, because the former cannot be justified as motivated by optimal length considerations. To test these two alternative hypotheses, four listening experiments were conducted. There was an additional reading experiment preceding the listening experiments to explore potential effects of the Late Closure strategy and constituent lengths in reading where there is no overt prosody. In all cases the target materials were temporarily ambiguous Turkish sentences which could be morphologically resolved as either LC or EC syntactic constructions. Constituent lengths were systematically manipulated in all target materials, such that the length-optimal prosodic phrasing was associated with LC syntax in one condition, and with EC syntax in the other. Experiment 1 employed a missing morpheme task developed for this study. In the missing morpheme task, underscores (length-averaged) replaced the disambiguating morphemes and participants had to insert them as they read the sentences aloud. Results revealed significant effects of phrase lengths in readers' syntactic interpretations as indicated by the morphemes they inserted and the prosodic breaks they produced. Experiments 2A and 2B employed an end-of-sentence `got it' task (Frazier et al., 1983), in which participants listened to spoken sentences and indicated after each one whether they understood or did not understand it. Sentences in Experiment 2A had phrase length distribution similar to the SKS English materials. Experiment 2B manipulated lengths in reverse. The stimuli had cooperating, conflicting or neutral prosody. Response time data supported an interplay of both syntactic Late Closure and RSH. Thus it was concluded that constituent lengths can indeed have a significant effect on listeners' parsing decisions, in addition to the familiar syntactic parsing biases and prosodic influences. Experiments 3A and 3B used a lexical probe version of the phoneme restoration paradigm employed by Stoyneshka et al. (2010). In the phoneme restoration paradigm, the disambiguating phonemes (in the verb, in these materials) are replaced with noise (in this study, pink noise). In the lexical probe version of this paradigm (developed for this study) participants listened to the sentences with LC, EC or neutral prosody, and at the end of the sentence they were presented with a visual probe (one of the two possible disambiguating verbs, complete with all phonemes) that was congruent or incongruent or compatible with the prosody of the sentence they had heard. Their task was to respond to the visual probe either `yes' (i.e., `I heard this word in the sentence I have just listened to') or `no' (i.e., `I didn't hear this word'). Response time to the probe word indirectly taps which of the disambiguating morphemes on the verb the listener mentally supplies when it has been replaced by noise. The materials for Experiments 3A and 3B were identical to those used in Experiments 2A and 2B respectively except that the disambiguating phonemes were noise-replaced. Results of Experiments 3A and 3B showed that listeners were highly sensitive to the sentential prosody as revealed by their phoneme restoration responses and response time data, confirming Stoyneshka et al.'s findings establishing the reliability of the phoneme restoration paradigm in investigating effects of prosody in ambiguity resolution. Response time data showed a pattern similar to what SKS observed for English (except for one condition in Experiment 3A, with incongruent probes): despite the phrase length reversal in Experiment 3B, there was no influence of phrase length distribution on ambiguity resolution. This has a natural explanation in light of the difference between the `got it' task with disambiguating morphology within the sentence stimulus, and the phoneme restoration task in which the listener can project onto the verb whatever morphology is compatible with the heard prosody. LC and EC were processed equally well for congruent probes, and there was an LC advantage in the incongruent and compatible probe conditions. Overall results support the hypothesis that syntactic Late Closure becomes evident in listening when prosody is absent or misleading, and also that phrase lengths can play a significant role.

  • Automated Classification of Argument Stance in Student Essays: A Linguistically Motivated Approach with an Application for Supporting Argument Summarization

    Author:
    Adam Faulkner
    Year of Dissertation:
    2014
    Program:
    Linguistics
    Advisor:
    Martin Chodorow
    Abstract:

    This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance. A novel set of document-level stance classification features motivated by linguistic research involving stancetaking language is described. Several document-level classification models incorporating these features are trained and tested on a corpus of student essays annotated for stance. These models achieve accuracies significantly above those of two baseline models. High-accuracy features used by these models include a dependency subtree feature incorporating information about the targets of any stancetaking language in the essay text and a feature capturing the semantic relationship between the essay prompt text and stancetaking language in the essay text. We also describe the construction of a corpus of essay sentences annotated for supporting argument stance. The resulting corpus is used to train and test two sentence-level classification models. The first model is designed to classify a given sentence as a supporting argument or as not a supporting argument, while the second model is designed to classify a supporting argument as holding a for or against stance. Features motivated by influential linguistic analyses of the lexical, discourse, and rhetorical features of supporting arguments are used to build these two models, both of which achieve accuracies above their respective baseline models. An application illustrating an interesting use-case for the models presented in this dissertation is described. This application incorporates all three classification models to extract a single sentence summarizing both the overall stance of a given text along with a convincing reason in support of that stance.

  • Demonstratives in Motion: The Grammaticalization of Demonstratives as a Window into Synchronic Phenomena

    Author:
    LISA FERRAZZANO
    Year of Dissertation:
    2013
    Program:
    Linguistics
    Advisor:
    Christina Tortora
    Abstract:

    There is significant variation in the literature on how demonstratives are characterized semantically, leading to divergent syntactic analyses of demonstratives. A major source of this disagreement regards how distance specifications relate to the demonstrative: whether [+/-speaker] is an integral property of the demonstrative or not. I argue that distance-marking divides the class of demonstratives into strong and weak, along the lines of what Cardinaletti and Starke (1999) propose for pronouns. Strong demonstratives possess a [+/-speaker] feature, while weak demonstratives have a neutral [speaker] feature, corresponding to a distance-neutral interpretation, and the pragmatic notion of immediate accessibility of the referent (Lyons 1999). The diachronic component of this work serves as a lens through which to view the demonstrative's synchronic behavior. I argue that the process of grammaticalization (Meillet 1912) allows us to `see' certain aspects of a demonstrative's meaning (and, I argue, corresponding internal syntactic structure) getting peeled away as the demonstrative evolves. Latin ille and spoken Finnish se provide evidence that demonstratives pass through a distance-neutral phase before being analyzed as definite articles, suggesting that strong and weak demonstratives should receive distinct analyses in the synchronic domain. I argue that strong and weak demonstratives can be viewed as synchronic imprints of a diachronic process. In addition to teasing apart different semantic types of demonstratives, this dissertation seeks to identify differences between demonstratives and definite articles. I propose that the demonstrative is specified for (i) [(+/-) speaker], (ii) [+contrastive] (encoding contrast), and (iii) [+identifiability], and that these features are encoded on functional heads in the extended projection of the demonstrative. The complex demonstrative is merged in a dedicated functional projection ([Spec, TrackerAdjP) within the DP. The definite article, in contrast, expresses only [+identifiability], and is merged directly in the DP projection. I argue that the common core of [+identifiability] helps explain the synchronic and diachronic dependency between the demonstrative and the DP projection, and sheds light on our discussion on the phenomenon of apparent `double definiteness.'

  • HUU-FA THESIS DAT?: A Syntactic Analysis of Possessive Jamaican Creole Possessive WH-elements

    Author:
    Toni Foster
    Year of Dissertation:
    2014
    Program:
    Linguistics
    Advisor:
    Marcel den Dikken
    Abstract:

    HUU-FA THESIS DAT?: A Syntactic Analysis of Possessive Jamaican Creole Possessive WH-elements by Toni Foster Advisor: Professor Marcel den Dikken This thesis discusses the differences between the Jamaican Creole expressions huu-fa and fi-huu. Jamaican Creole is a language that was born from a combination of the lexifier language English and the substrate language Twi, therefore it is reasonable to check whether the features of JC were derived from these languages. The distribution of huu-fa and fi-huu resembles the distribution of English whose. Fi-huu and huu-fa are WH-elements that show possession, similar to the English word whose. They are made of a WH-pronoun and a form of the preposition fi "for". Both terms differ in internal structure, and distribution. The difference between huu-fa and fi-huu will be dissected in terms of substrate and superstrate influences as well as the elements involved in their formation. Ultimately, this thesis states that the internal structure of the PP huu-fa prevents it from appearing adnominally.

  • THE DP HYPOTHESIS THROUGH THE LENS OF JAPANESE NOMINAL COLLOCATION CONSTRUCTIONS

    Author:
    Kaori Furuya
    Year of Dissertation:
    2009
    Program:
    Linguistics
    Advisor:
    Marcel Dikken
    Abstract:

    In Japanese, bare noun phrases can refer to the object that is introduced in a previous context, whereas in English, the definite article is required for a common noun phrase to refer. The research question of this discussion is whether Japanese syntactically projects a determiner phrases (DP) although it does not have an article such as the in English. If Japanese does not project DP unlike English, the definiteness of referential arguments needs to be parameterized in syntax and in semantics. On the other hand, if Japanese projects DP, it suggests that DP is part of Universal Grammar (UG) and thus that no parameterization is called for. This thesis presents three pieces of evidence to support the DP hypothesis for Japanese by examining nominal collocation constructions such as watasitati 3-nin `we three' and watasitati sensei `we professors' In Chapter 2, the first argument stems from specificity effects. In Japanese numeral classifiers (NCs) cannot float away from personal pronouns. Likewise, NCs cannot get raised outside the associated bare noun phrases when the noun phrases possess definite interpretations. This implies that Japanese projects DP and that the DP blocks NCs from moving outside. In Chapter 3, examination of the internal structure of nominal collocation constructions is conducted. The grouping of personal pronouns and common noun phrases is ungrammatical when the common noun phrases have a plural marker and occur prenominally with the genitive marker. Moreover, NCs cannot also appear prenominally with the genitive marker when the host noun phrases involve personal pronouns unlike in the case of common noun phrases. Based on the argument of the nominal predication hypothesis due to the former property, the ungrammaticality of the second property is argued in terms of D feature on DP, in favor of the DP hypothesis. In Chapter 4, the left periphery of nominal collocation constructions is investigated. The fact that not all noun phrases allow for adjunction is explained in terms of the ban on adjunction to DP. If these arguments are correct, this suggests that DP is part of UG and that in Japanese the lack of a determiner is only due to morpho-phonological reasons.

  • The Acquisition of L2 Reading Comprehension: The Relative Contribution of Linguistic Knowledge and Existing Reading Ability

    Author:
    Leigh Garrison-Fletcher
    Year of Dissertation:
    2012
    Program:
    Linguistics
    Advisor:
    Gita Martohardjono
    Abstract:

    The study presented here examines the development of second language (L2) reading comprehension among adolescents who speak Spanish as their native language (L1) and are just beginning to learn English. The existing research on L2 reading comprehension among adolescents has focused on the transfer of reading skills from the L1 to the L2 and on the role of L2 linguistic knowledge. The research has suggested that reading skills transfer from the L1 to the L2, but that L2 linguistic knowledge plays the strongest role in L2 reading comprehension. However, previous research has not fully investigated the role of the L1 in the L2 reading development of adolescent learners. Crucially, students with low levels of L1 reading have not been included in the research, and such students must be studied in order to get a complete picture of the role of L1 reading in L2 reading. This study further expands on the previous research by including a group of participants not included in the research program on L2 reading comprehension among adolescent learners--namely, adolescent newcomer English language learners (ELLs) who arrive in the United States and enter the school system in middle or high school. Research on these students is lacking and little is known about their development of L2 academic skills. The main finding from the study is that L1 reading comprehension is the strongest contributor to L2 reading comprehension, as compared to the other predictor variables: L2 vocabulary, L2 syntax, and L1 vocabulary. This result is in opposition to previous research findings that L2 language skills play a more important role in L2 reading comprehension than L1 reading comprehension. It is clear that for newcomer adolescent ELLs in U.S. schools, their level of L1 reading is an important contributor to their development of L2 reading comprehension. Thus, educators should be aware of their students' L1 reading skills upon entry to school in order to provide them with the best instruction.

  • Canvas: A fast and accurate geometric sentence alignment system using lexical cues within complex misalignment settings

    Author:
    Hussein Ghaly
    Year of Dissertation:
    2014
    Program:
    Linguistics
    Advisor:
    Andrew Rosenberg
    Abstract:

    In this paper, we present a new sentence alignment system (Canvas), which is a Python implementation of a geometric approach to sentence alignment, based on lexical cues. Canvas system is designed mainly to handle parallel texts exhibiting complex misalignment patterns, namely within English-Arabic pairs for United Nations documents. The system relies heavily on pre-indexing words/tokens in the source and target texts, and it creates correspondences between the token indexes. From this point onward, the alignment problem is reduced to a geometric problem of finding the path that runs through the True Correspondence Points (TCPs). The likelihood of a point being a TCP depends on the clustering of other points nearby; so, we collect the most likely points, and we identify the shortest path containing the maximum number of these points using a modified form of Dijkstra's algorithm. The results of Canvas system are very promising, as they demonstrate that it can handle intricate misalignment patterns, with much better speed than other alignment approaches using lexical cues, and with good accuracy in general, in a completely automated fashion. The only drawback is that the system does not cover all the alignment segments and this coverage is generally lower than other systems, which can be a subject of future research.

  • The Acquisition of an L2 Vowel System: A Longitudinal Investigation of Change

    Author:
    Fran Gulinello
    Year of Dissertation:
    2009
    Program:
    Linguistics
    Advisor:
    Charles Cairns
    Abstract:

    To what extent do the vowels systems of L2 learners change over time and what types of changes can be expected? The study reported here is a longitudinal investigation of change in the vowel systems of five adult native Spanish speakers learning English. It focuses on eleven vowels of English as uttered in CVC words and in various sentential contexts. Vowel productions from each speaker were measured for the acoustic parameters of F1, F2 and duration. These acoustic parameters were then analyzed via the classification matrices of discriminant analysis and compared over time. Change in the nonnative speakers was analyzed in two ways: independently of the target and in direct comparison to the target. Research in L2 acquisition has suggested that interlanguage is a system unto itself unlike the native language or the target language (Selinker, 1972). Thus, the nonnative speakers' vowels were first examined independently of the native speakers' vowels. This phase of the analysis showed which vowels were differentiated by a speaker on the three acoustic parameters, which were not, and whether there were changes over time in how vowels were differentiated. Research in cross-linguistic production has shown that learners may approximate target norms without necessarily achieving them (Flege, 1980). Therefore, in addition to considering the interlanguage of the nonnative speakers, change over time was also examined with respect to the target language. Nonnative speakers' vowels were compared directly to the two native speaker participants in the study. This second phase of the analysis showed whether changes approximated target norms. Findings indicate that the vowels of nonnative speakers change in ways that reflect dialectal and diachronic changes. Specifically, we see instances of split, merger and shift as described by Labov (1994). It is also the case, however, that changes occur that are unique to L2 acquisition. These changes are undoubtedly related to the learning of orthography and sound-spelling correspondences. This study provides evidence that intermediate phonological systems arising during L2 acquisition should be viewed not only in terms of the target but as unique systems of contrasts. It also provides evidence that changes are not necessarily unilateral; movement in one aspect of a system can affect other aspects of the system.

  • THE SYNTAX OF NON-VERBAL CAUSATION: THE CAUSATIVE APOMORPHY OF `FROM' IN GREEK AND GERMANIC LANGUAGES

    Author:
    Alexandra Ioannidou
    Year of Dissertation:
    2012
    Program:
    Linguistics
    Advisor:
    Marcel den Dikken
    Abstract:

    This is a study of the meaning and syntax of non-(lexical)verbal causation. Macroscopically, it examines the preposition `from' as attested in contexts like "X is/comes from Y". Syntactic diagnostics are applied to formally distinguish the causative from the spatial interpretations of `from'-PPs in Greek, English, Dutch, and German. The syntactic landscape of causative `from' will turn out to be very minimal with `from' directly selecting the Cause-DP, in contradistinction to its spatial counterpart, where `from' always selects for another PP layer. More microscopically then I focus on the causative interpretations only, which are particularly revealing because (i) they give an in-depth view of CAUSE, stripped of all verbal layers—traditionally considered the locus of CAUSE—suggesting that the source of causation in non-(lexical)verbal environments has to be the preposition per se and (ii) they single-handedly provide a rudimentary structure for causation, where `from' introduces the Cause in its complement and is predicated of the Causee. Finally, with a basic predicational structure in place, I offer a detailed cross-linguistic account for the syntactic mechanism that forces the use of particle verbs in causative `from'-less environments.

  • CONTRIBUTIONS OF STATISTICAL INDUCTION TO MODELS OF SYNTAX ACQUISITION

    Author:
    Xuan-Nga Kam
    Year of Dissertation:
    2009
    Program:
    Linguistics
    Advisor:
    Janet Dean Fodor
    Abstract:

    Recent challenges to Chomsky's poverty of the stimulus thesis for language acquisition suggest that children's primary data may carry ‘indirect evidence’ about linguistic constructions despite containing no instances of them, with the deeper implication that innate knowledge is not needed for grammar acquisition. Reali & Christiansen (2005) demonstrated that a simple bigram model trained on child-directed speech can induce the correct form of auxiliary inversion in certain complex English questions (e.g., Is the boy who is crying hurt?). The significance of this achievement is called into question, however, by Experiments 1–6 reported here, which show that the success is highly circumscribed, resting on one particular bigram (<who is> or <that is>) in the grammatical test sentences. The model performs poorly on inversion in related constructions in English and Dutch, which do not afford effective cues accessible to a bigram analysis. Performance improved modestly when learning resources were added in Experiments 7–15: the learning algorithm was upgraded to a trigram model, corpus size was increased, part-of-speech information was provided. Even so, there were no circumstances in which auxiliary inversion was well-discriminated across other variants (with do-support, with object-gap relatives). This suggests that the n-gram models were not capturing the linguistic generalization that unites the various instances of auxiliary inversion. This weak performance is unsurprising, since the n-gram learners had no access to information about phrase-structure. Chomsky (1980) emphasized the significance of ‘structure dependence’ for correct application of the auxiliary-inversion rule. Experiments 16–18 provided some partial phrase-structure information relevant to the task. When noun phrases in the corpus and test sentences were surrounded by NP brackets, performance was extremely poor. But replacing each (maximal) noun phrase by the symbol NP finally yielded success across all three sub-cases of auxiliary inversion tested. Consequently, based on the results to date, the n-gram challenge to stimulus poverty and UG remains unsubstantiated. However, if it can be shown in future work that an n-gram model is capable of assigning phrase-structure to word-strings, there are grounds for anticipating that it could succeed in extracting the general pattern of auxiliary-inversion.