Filter Dissertations and Theses By:
Canvas: A fast and accurate geometric sentence alignment system using lexical cues within complex misalignment settings
Year of Dissertation:
In this paper, we present a new sentence alignment system (Canvas), which is a Python implementation of a geometric approach to sentence alignment, based on lexical cues. Canvas system is designed mainly to handle parallel texts exhibiting complex misalignment patterns, namely within English-Arabic pairs for United Nations documents. The system relies heavily on pre-indexing words/tokens in the source and target texts, and it creates correspondences between the token indexes. From this point onward, the alignment problem is reduced to a geometric problem of finding the path that runs through the True Correspondence Points (TCPs). The likelihood of a point being a TCP depends on the clustering of other points nearby; so, we collect the most likely points, and we identify the shortest path containing the maximum number of these points using a modified form of Dijkstra's algorithm. The results of Canvas system are very promising, as they demonstrate that it can handle intricate misalignment patterns, with much better speed than other alignment approaches using lexical cues, and with good accuracy in general, in a completely automated fashion. The only drawback is that the system does not cover all the alignment segments and this coverage is generally lower than other systems, which can be a subject of future research.
The Acquisition of an L2 Vowel System: A Longitudinal Investigation of Change
Year of Dissertation:
To what extent do the vowels systems of L2 learners change over time and what types of changes can be expected? The study reported here is a longitudinal investigation of change in the vowel systems of five adult native Spanish speakers learning English. It focuses on eleven vowels of English as uttered in CVC words and in various sentential contexts. Vowel productions from each speaker were measured for the acoustic parameters of F1, F2 and duration. These acoustic parameters were then analyzed via the classification matrices of discriminant analysis and compared over time. Change in the nonnative speakers was analyzed in two ways: independently of the target and in direct comparison to the target. Research in L2 acquisition has suggested that interlanguage is a system unto itself unlike the native language or the target language (Selinker, 1972). Thus, the nonnative speakers' vowels were first examined independently of the native speakers' vowels. This phase of the analysis showed which vowels were differentiated by a speaker on the three acoustic parameters, which were not, and whether there were changes over time in how vowels were differentiated. Research in cross-linguistic production has shown that learners may approximate target norms without necessarily achieving them (Flege, 1980). Therefore, in addition to considering the interlanguage of the nonnative speakers, change over time was also examined with respect to the target language. Nonnative speakers' vowels were compared directly to the two native speaker participants in the study. This second phase of the analysis showed whether changes approximated target norms. Findings indicate that the vowels of nonnative speakers change in ways that reflect dialectal and diachronic changes. Specifically, we see instances of split, merger and shift as described by Labov (1994). It is also the case, however, that changes occur that are unique to L2 acquisition. These changes are undoubtedly related to the learning of orthography and sound-spelling correspondences. This study provides evidence that intermediate phonological systems arising during L2 acquisition should be viewed not only in terms of the target but as unique systems of contrasts. It also provides evidence that changes are not necessarily unilateral; movement in one aspect of a system can affect other aspects of the system.
THE SYNTAX OF NON-VERBAL CAUSATION: THE CAUSATIVE APOMORPHY OF `FROM' IN GREEK AND GERMANIC LANGUAGES
Year of Dissertation:
Marcel den Dikken
This is a study of the meaning and syntax of non-(lexical)verbal causation. Macroscopically, it examines the preposition `from' as attested in contexts like "X is/comes from Y". Syntactic diagnostics are applied to formally distinguish the causative from the spatial interpretations of `from'-PPs in Greek, English, Dutch, and German. The syntactic landscape of causative `from' will turn out to be very minimal with `from' directly selecting the Cause-DP, in contradistinction to its spatial counterpart, where `from' always selects for another PP layer. More microscopically then I focus on the causative interpretations only, which are particularly revealing because (i) they give an in-depth view of CAUSE, stripped of all verbal layers—traditionally considered the locus of CAUSE—suggesting that the source of causation in non-(lexical)verbal environments has to be the preposition per se and (ii) they single-handedly provide a rudimentary structure for causation, where `from' introduces the Cause in its complement and is predicated of the Causee. Finally, with a basic predicational structure in place, I offer a detailed cross-linguistic account for the syntactic mechanism that forces the use of particle verbs in causative `from'-less environments.
CONTRIBUTIONS OF STATISTICAL INDUCTION TO MODELS OF SYNTAX ACQUISITION
Year of Dissertation:
Recent challenges to Chomsky's poverty of the stimulus thesis for language acquisition suggest that children's primary data may carry ‘indirect evidence’ about linguistic constructions despite containing no instances of them, with the deeper implication that innate knowledge is not needed for grammar acquisition. Reali & Christiansen (2005) demonstrated that a simple bigram model trained on child-directed speech can induce the correct form of auxiliary inversion in certain complex English questions (e.g., Is the boy who is crying hurt?). The significance of this achievement is called into question, however, by Experiments 1–6 reported here, which show that the success is highly circumscribed, resting on one particular bigram (<who is> or <that is>) in the grammatical test sentences. The model performs poorly on inversion in related constructions in English and Dutch, which do not afford effective cues accessible to a bigram analysis. Performance improved modestly when learning resources were added in Experiments 7–15: the learning algorithm was upgraded to a trigram model, corpus size was increased, part-of-speech information was provided. Even so, there were no circumstances in which auxiliary inversion was well-discriminated across other variants (with do-support, with object-gap relatives). This suggests that the n-gram models were not capturing the linguistic generalization that unites the various instances of auxiliary inversion. This weak performance is unsurprising, since the n-gram learners had no access to information about phrase-structure. Chomsky (1980) emphasized the significance of ‘structure dependence’ for correct application of the auxiliary-inversion rule. Experiments 16–18 provided some partial phrase-structure information relevant to the task. When noun phrases in the corpus and test sentences were surrounded by NP brackets, performance was extremely poor. But replacing each (maximal) noun phrase by the symbol NP finally yielded success across all three sub-cases of auxiliary inversion tested. Consequently, based on the results to date, the n-gram challenge to stimulus poverty and UG remains unsubstantiated. However, if it can be shown in future work that an n-gram model is capable of assigning phrase-structure to word-strings, there are grounds for anticipating that it could succeed in extracting the general pattern of auxiliary-inversion.
Vocabulary Through Affixes and Word Families - A Computer-Assisted Language Learning Program for Adult ELL Students
Year of Dissertation:
Vocabulary plays an important role in language learning of ELL (English Language Learner) students. This work discusses the importance of metalinguistic awareness in teaching vocabulary to adult English Language Learners at an intermediate- or advanced-level of English language proficiency with an emphasis on learning vocabulary through word families and increased morphological awareness. The main contribution is a computer-based program that guides users through a series of interactive reading and vocabulary practice exercises which allow them to explore and learn how certain words are connected through word families and how some of the most common affixes in English can affect the meaning and grammatical function of words. Unlike most existing Computer-Assisted Language Learning systems, the number of vocabulary practice exercises it can produce is unlimited, as is the range of reading materials it can analyze, including text supplied by the users.
Overt versus null subject pronoun variation in the Turkish spoken in Turkey and in New York City
Year of Dissertation:
The purpose of this dissertation is to examine the use of subject personal pronouns in the Turkish spoken in Turkey and in New York City from a variationist perspective. Whereas the variable use of subject personal pronouns in Turkish has been extensively analyzed in many studies conducted in Europe, it has received much less attention in the U.S. This study has as one of its aims replicating the study conducted by Otheguy, Zentella and Livert (2007) where the influence of different social and linguistic variables on the expression of Spanish subject pronouns was examined across Latin American and Caribbean immigrant generations in New York. The present study examines several linguistic and social variables that condition the presence and absence of subject personal pronouns in the speech of 20 adult speakers living in Turkey (TT) and 20 living in New York (TNY). The study compares the rate of subject pronoun use in Turkey with that of NYC and whether contact with English has an influence on the overt pronoun rate. In both the TT and TNY samples, there were an equal number of males and females and an equal number of speakers from working and professional classes. The speakers ranged in age from 20 to 80. Data analysis involved Anovas, correlations, cross-tabulations and multivariate regression analyses of linguistic and social variables. The linguistic variables, which were also examined in Otheguy et al. (2007) and in other previous studies, are person and number of the pronoun and of the verb, continuity of reference, and TMA of the verb. Social variables that are analyzed are gender, social class, age of the informant, education, age of arrival in NYC, length of residence in NYC and so forth. The results of the study indicate that TT and TNY resemble each other regarding the linguistic variables that condition the distribution of subject personal pronouns and regarding the order of the variables that account for the most variance in the use of the pronouns. However, the two samples differ from one another with respect to the order and strength of the constraints within the person and number of the verb variable. In addition, we find a significantly higher rate of overt pronoun use for TNY than for TT. These findings are consistent with those obtained in the Spanish study and provide clear support for an English contact hypothesis when the increased use of overt subject pronouns among TNY and differences in constraint hierarchies between TT and TNY are taken into consideration.
Processing the not-because ambiguity in English: the role of pragmatics and prosody.
Year of Dissertation:
This dissertation investigates the processing of not-because sentences in English (e.g. Jane didn't purchase the blouse because it was silk), which are scopally ambiguous between BEC>NOT (Jane did not buy it) and NOT>BEC (Jane bought it for some other reason) readings. Frazier and Clifton (1996) had found a strong dispreference for NOT>BEC, which could be attributed to high attachment of the because-clause outside the scope of negation, in conflict with an otherwise very general processing tendency to attach incoming constituents low. The present study was designed to evaluate the possibility that no adjustment of the parsing model is necessitated, because the NOT>BEC reading has marked prosodic and pragmatic properties which would not be anticipated by the parser without substantial contextual support. In two self-paced reading experiments, disambiguated target constructions were presented either as main clauses or embedded in if-clauses. If-subordination was hypothesized to neutralize the marked prosodic and pragmatic properties of NOT>BEC by (a) suppressing a prosodic boundary before because and (b) reducing perceived `incompleteness' by guaranteeing that another clause would follow. In Experiment 1, significantly slower processing occurred for NOT>BEC than BEC>NOT targets in main clauses, replicating previous results, but no processing time difference was evident when the not-because construction was embedded within an if-clause. Experiment 2 followed to separate the two factors, assessing the contribution of prosody alone. All details of Experiment 1 were maintained except that the not-because construction displayed on a single line in Experiment 1 was now distributed over two lines. The line-break inserted before because was expected to encourage a prosodic break there, due to readers' tendency to interpret visual display segmentations as prosodic breaks, thus favoring BEC>NOT. The reading time data confirmed this, showing no sign of the if-subordination amelioration observed in Experiment 1. Thus, Experiment 2 confirms that prosody is a crucial contributor to the usual difficulty of NOT>BEC. A general conclusion is that standard parsing strategies are not falsified by not-because, but may be overridden by its unusual linguistic properties.
The Rise of Disyllables in Old Chinese: the Role of Lianmian Words
Year of Dissertation:
The history of Chinese language is characterized by a clear shift from monosyllabic to disyllabic words (Wang, 1980). This dissertation aims to provide a new diachronic explanation for the rise of disyllables in the history of Chinese and to demonstrate its significance for Modern Chinese prosody and lexicalization. A corpus of 300 Lianmian words in Old Chinese was compiled, including 96 Shuangsheng words, 172 Dieyun words and 32 Splitting-sound words. This study builds on previous morphological and phonological research on disyllables in Chinese and looks closely at detailed aspects of Old Chinese sound patterns and their evolution. Based on the analysis of sound patterns of Splitting-sound words and Dieyun words in Old Chinese, evidence from neighboring languages, statistical analysis of the development of Old Chinese, and reconstructed syllable structure, I argue that the simplification of complex onsets in Old Chinese was a central motivating factor for the rise of the earliest disyllabic forms - Splitting-sound words. Monosyllabic words with historic initial CL clusters (L a liquid), undergo fission, surfacing as disyllables where the first syllable has the simple C onset and the second the L onset. The occurrence of the liquid in the second syllable onset preserves consonant identity, which would otherwise be lost in the onset simplification process. Generalization of this process soon gave rise to another type of mono-morphemic disyllable - Dieyun. Once onset simplification was complete, around Late Old Chinese to Early Middle Chinese period, phonological motivation for syllabic fission disappeared. Mono-morphemic disyllables lost their productivity at this point. The disyllabic template they defined was preserved, giving rise to productive formation of disyllabic compounds. This word-formation process appears to be responsible for the dominance of disyllables in many modern Chinese languages spoken today. This diachronic phonological research accounts for issues that previous studies fail to address. It reveals the relation between the rise of disyllables and the creation of Lianmian words, the relation between the creation of Lianmian words and the simplification of Old Chinese phonology. It enriches our understanding of the role of Lianmian words and of Old Chinese phonological development in Chinese historical disyllabicity.
How to ask questions in Mandarin Chinese
Year of Dissertation:
This thesis re-examines the four main question-types in Mandarin Chinese, namely, particle questions, háishì questions, A-not-A questions and wh-questions, whose previous accounts are argued to be unsatisfactory due to various faulty assumptions about questions, particularly the stipulation of `Q'. Each of the four Mandarin Chinese question-types is re-accounted based on the view that questions are speech-acts, whose performance are done by way of speakers' subconscious choice of sentence-types that mirror their ignorance-types, as proposed in Fiengo (2007). It is further demonstrated that viewing questions as speech-acts instead of a structurally marked sentence-type allows a simpler and more intuitive account for expressions that occur in them. Two expressions are re-evaluated for that matter: the sentential adverb dàodi in Mandarin Chinese and wh-the-hell in English.
The Encoding of Temporality in Second Language Acquisition: A Study of Mandarin Chinese-speaking ESL Learners
Year of Dissertation:
This dissertation investigates the influences of pragmatic factors, lexical devices, as well as the lexical aspectual properties of verbs on second language learners' encoding of temporality in their target language. The pragmatic factors being investigated include a recency effect and the number of occurrences of a tense in the previous context, and the lexical devices include past-time temporal adverbials and frequency adverbs. The role of the lexical aspectual properties of verbs is checked against the Aspect Hypothesis, which states that learners will initially restrict past or perfective marking to achievement and accomplishment verbs and later gradually extend this usage to activity and stative verbs. Unlike many previous studies, which collect data from learners of various native language backgrounds, the present study analyzes empirical data gathered solely from Mandarin Chinese-speaking ESL learners, whose native language temporality system differs dramatically from that of their target language. That is, Mandarin Chinese is a tenseless language, while English uses tense and verbal morphology to indicate temporal locations and relations. The findings in the present study indicate that (i) a recency effect in a passage does not affect English native speakers' or Chinese native speakers' tense choice, (ii) both English native speakers and Chinese native speakers show a tendency to use the duplicated tense in the previous context to mark a test item in the following discourse, (iii) past-time temporal adverbials show an obvious tense reminding effect when there is no matrix agreement, (iv) the introduction of a frequency adverb is associated with a higher usage rate of the present tense for a test item in a past-time context, but not in a present-time context, and (v) no supporting evidence for the Aspect Hypothesis is found and the inherent lexical aspectual properties of verbs do not seem to influence on learners' tense choice. The present study contributes to our understanding of the development of second language learners' expression of temporal locations and relations in their target language. It also raises the question of how English native speakers and second language learners are similar to each other in language processing.