Alumni Dissertations

 

Alumni Dissertations

Filter Dissertations By:

 
 
  • SYSTEMATIC COMPARISON OF CROSS-LINGUAL PROJECTION TECHNIQUES FOR LOW-DENSITY NLP UNDER STRICT RESOURCE CONSTRAINTS

    Author:
    Joshua Waxman
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Matt Huenerfauth
    Abstract:

    The field of low-density NLP is often approached from an engineering perspective, and evaluations are typically haphazard - considering different architectures, given different languages, and different available resources - without a systematic comparison. The resulting architectures are then tested on the unique corpus and language for which this approach has been designed. This makes it difficult to truly evaluate which approach is truly the "best," or which approaches are best for a given language. In this dissertation, several state-of-the-art architectures and approaches to low-density language Part-Of-Speech Tagging are reimplemented; all of these techniques exploit a relationship between a high-density (HD) language and a low-density (LD) language. As a novel contribution, a testbed is created using a representative sample of seven (HD - LD) language pairs, all drawn from the same massively parallel corpus, Europarl, and selected for their particular linguistic features. With this testbed in place, never-before-possible comparisons are conducted, to evaluate which broad approach performs the best for particular language pairs, and investigate whether particular language features should suggest a particular NLP approach. A survey of the field suggested some unexplored approaches with potential to yield better performance, be quicker to implement, and require less intensive linguistic resources. Under strict resource limitations, which are typical for low-density NLP environments, these characteristics are important. The approaches investigated in this dissertation are each a form of language ifier, which modifies an LD-corpus to be more like an HD-corpus, or alternatively, modifies an HD-corpus to be more like an LD-corpus, prior to supervised training. Each relying on relatively few linguistic resources, four variations of language ifier designs have been implemented and evaluated in this dissertation: lexical replacement, affix replacement, cognate replacement, and exemplar replacement. Based on linguistic properties of the languages drawn from the Europarl corpus, various predictions were made of which prior and novel approaches would be most effective for languages with specific linguistic properties, and these predictions were evaluated through systematic evaluations with the testbed of languages. The results of this dissertation serve as guidance for future researchers who must select an appropriate cross-lingual projection approach (and a high-density language from which to project) for a given low-density language. Finally, all the languages drawn from the Europarl corpus are actually HD, but for the sake of the evaluation testbed in this dissertation, certain languages are treated as if they were LD (ignoring any available HD resources). In order to evaluate how various approaches perform on an actual LD language, a case study was conducted in which part-of-speech taggers were implemented for Tajiki, harnessing linguistic resources from a related HD-language, Farsi, using all of the prior and novel approaches investigated in this dissertation. Insights from this case study were documented so that future researchers can gain insight into what their experience might be in implementing NLP tools for an LD language given the strict resource limitations considered in this dissertation.

  • A Parametrization-Based Surface Reconstruction System for Triangular Mesh Simplification with Application to Large Scale Scenes

    Author:
    Adriana Wise
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Ioannis Stamos
    Abstract:

    The laser scanner is nowadays widely used to capture the geometry of art, animation maquettes, or large architectural, industrial, and land form models. It thus poses specific problems depending on the model scale. This thesis provides a solution for simplification of triangulated data and for surface reconstruction of large data sets, where feature edges provide an obvious segmentation structure. It also explores a new method for model segmentation, with the goal of applying multiresolution techniques to data sets characterized by curvy areas and the lack of clear demarcation features. The preliminary stage of surface segmentation, which takes as input single or multiple scan data files, generates surface patches which are processed independently. The surface components are mapped onto a two-dimensional domain with boundary constraints, using a novel parametrization weight coefficient. This stage generates valid parameter domain points, which can be fed as arguments to parametric modeling functions or surface approximation schemes. On this domain, our approach explores two types of remeshing. First, we generate points in a regular grid pattern, achieving multiresolution through a flexible grid step, which nevertheless is designed to produce a globally uniform resampling aspect. In this case, for reconstruction, we attempt to solve the open problem of border reconciliation across adjacent domains by retriangulating the border gap between the grid and the fixed irregular border. Alternatively, we straighten the domain borders in the parameter domain and coarsely triangulate the resulting simplified polygons, resampling the base domain triangles in a 1-4 subdivision pattern, achieving multiresolution from the number of subdivision steps. For mesh reconstruction, we use a linear interpolation method based on the original mesh triangles as control points on local planes, using a saved triangle correspondence between the original mesh and the parametric domain. We also use a region-wide approximation method, applied to the parameter grid points, which first generates data-trained control points, and then uses them to obtain the reconstruction values at the resamples. In the grid resampling scheme, due to the border constraints, the reassembly of the segmented, sequentially processed data sets is seamless. In the subdivision scheme, we align adjacent border fragments in the parameter space, and use a region-to-fragment map to achieve the same border reconstruction across two neighboring components. We successfully process data sets up to 1,000,000 points in one pass of our program, and are capable of assembling larger scenes from sequential runs. Our program consists of a single run, without intermediate storage. Where we process large input data files, we fragment the input using a nested application of our segmentation algorithm to reduce the size of the input scenes, and our pipeline reassembles the reconstruction output from multiple data files into a unique view.

  • OPTIMIZATION ALGORITHMS FOR PROXY PLACEMENT IN CONTENT DISTRIBUTION NETWORKS

    Author:
    Jun Wu
    Year of Dissertation:
    2011
    Program:
    Computer Science
    Advisor:
    Kaliappa Ravindran
    Abstract:

    Popular web sites receive an enormous share of Internet traffic. These sites have a competitive motivation to offer better service to their clients at lower cost. One of the solutions is to using content distribution network (CDN). When we optimize proxy placement in content distribution network we want to maximize the user experience in the mean time to minimize the resource consumption. We are dealing with a multi-objective optimization problem for CDNs in the thesis: the content latency experienced by users and the system resources expended to deliver content. After the topology model for content distribution network, we listed metrics to evaluate content distribution network. Then we defined the objective function for optimizing proxy placement. After reviewing the existing proxy placement algorithms we used genetic algorithm to solve the proxy placement problem. We first apply the genetic algorithms to a simple network, and find that the results are better than greedy algorithm and identical to the global optimal. Then we apply the genetic algorithm to a realistically larger network topology. The genetic algorithm can find good solutions to the proxy placement problem. But compared to greedy algorithm, it is less efficient. So when response time is important, genetic algorithm is not the best choice. To optimize the genetic algorithm we found if we choose higher mutation rate and step size at the beginning of the genetic operation and decrease the mutation rate and step size as we go through the genetic operation, the results are much better. For the initial populations, if we want quick response to the sudden change of client access we can choose smaller initial population, if we have sufficient response time especially in initial setting up the network we should choose larger initial populations. Under optimized genetic algorithm, we experiment three different network examples with their sizes between the simple network and large network. Since the network sizes are smaller we are able to run the exhaustive search to find the global optimal. In all cases the results from genetic algorithm outperform greed algorithm and close to global optimal. Although it is not guaranteed to find the global optimal solution as the simple network did, a robust good solution is enough to solve real world problems.

  • A SIMULATION STUDY ON USING THE VIRTUAL NODE LAYER TO IMPLEMENT EFFICIENT AND RELIABLE MANET PROTOCOLS

    Author:
    Jiang Wu
    Year of Dissertation:
    2011
    Program:
    Computer Science
    Advisor:
    Nancy Griffeth
    Abstract:

    The Virtual Node Layer (VNLayer) is a cluster based programming abstraction for a Mobile Ad-Hoc Network. VNLayer defines fixed or predictably mobile geographical regions. In each region, a number of mobile nodes collectively emulate a virtual node, which provides services and relays packets for client processes. As a clustering scheme with state replication, the VNLayer approach can theoretically improve the efficiency and reliability of MANET protocols. As a general programming abstraction, the VNLayer hides underlying complexities from protocol developers and can be shared by multiple applications. However, the VNLayer also introduces extra control overhead and prolongs data forwarding delay, which could be prohibitively expensive in terms of performance. Based on an existing VNLayer implementation [1], we developed an ns-2 based software package, VNSim. VNSim can be used to simulate VNLayer based applications in a MANET of up to a few hundred mobile nodes, in order to better understand the impact of the VNLayer approach on efficiency and reliability. With VNSim, we did our first case study on a VNLayer based MANET address allocation protocol, VNDHCP. Simulation results proved that the VNLayer approach can be used to adapt a wireline protocol to MANET. We also did an extensive simulation study on VNLayer based MANET routing. A wireline routing protocol, RIP, was adapted to run over the VNLayer. With the support provided by the VNLayer, the adapted protocol, VNRIP, was implemented very quickly and can provide reasonable performance. During the study of VNLayer based MANET routing, we identified a number of major performance limitations in the existing VNLayer implementation and the models it is based on. To tackle the limitations, we created a more realistic link layer model, extended the VNLayer model and optimized our VNLayer implementation. With the optimized VNLayer implementation, we implemented VNAODV, an adapted version of AODV, over the new link and VNLayer models. Simulation results indicate that VNAODV delivers more packets and creates more stable routes than standard AODV in a dense MANET with high node motion rate and moderate data traffic. This research validated the intuition that the VNLayer approach can be used to adapt wireline protocols quickly to MANET and to improve the performance of MANET protocols. This research also provided us some insights on how to implement and optimize cluster based MANET protocols.

  • TEXT EXTRACTION FROM NATURAL SCENE: METHODOLOGY AND APPLICATION

    Author:
    Chucai Yi
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Yingli Tian
    Abstract:

    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications.

  • Piecewise Surface Reconstruction from Range Data

    Author:
    Gene Yu
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    George Wolberg
    Abstract:

    Geometric modeling from range data is a long-standing problem in the field of computer graphics. It remains challenging even when the data is replete with simple surfaces such as planes and polynomials, as is common in urban scenes. The need to represent such scenes requires us to segment the data into its components. The collection of these components constitutes the final geometric model. This thesis introduces a method of piecewise surface reconstruction that fits a scene with a model composed of disjoint surfaces. The contribution of this work is the introduction of a surface evaluation method based on quantitative entropy measurements for balancing the tradeoff between accuracy and efficiency. Integrated surface evaluation enables us to produce output models that are accurate to within user-specified tolerances. Since our algorithm minimizes global criteria, it is robust to holes, occlusions, nonplanar surfaces, and missing data. Compared to methods that operate on unorganized point clouds and utilize no segmentation, our approach provides the user with greater control over the final appearance and error characteristics of the output model. A range of shape approximations such as plane, polynomial, and spline mesh surfaces can be used interchangeably. This flexibility is applicable to all scenes involving piecewise models.

  • Self-referentiality in Constructive Semantics of Intuitionistic and Modal Logics

    Author:
    Junhua Yu
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Sergei Artemov
    Abstract:

    This thesis explores self-referentiality in the framework of justification logic. In this framework initialed by Artemov, the language has formulas of the form t:F, which means "the term t is a justification of the formula F." Moreover, terms can occur inside formulas and hence it is legal to have t:F(t), which means "the term t is a justification of the formula F about t itself." Expressions like this is not only interesting in the semantics of justification logic, but also, as we will see, necessary in applications of justification logic in formalizing constructive contents implicitly carried by modal and intuitionistic logics. Works initialed by Artemov and followed by Brezhnev and others have successfully extracted constructive contents packaged by modality in many modal logics. Roughly speaking, they offer methods of substituting modalities by terms in various justification logics, and then computing the exact structure of each term. After performing these methods, each formula prefixed by a modality becomes a formula prefixed by a term, which intuitively stands for the justification of the formula being prefixed. In terminology of this framework, we say that modal logics are "realized" in justification logics. Within the family of justification logics, the Logic of Proofs LP is perhaps the most important member. As Artemov showed, this logic is not only complete w.r.t. to arithmetical semantics about proofs, but also accommodates the modal logic S4 via realization. Combined with Godel's modal embedding from intuitionistic propositional logic IPC to S4, the Logic of Proofs LP serves as an intermedium via which IPC receives its provability semantics, also known as Brouwer-Heyting-Kolmogorov semantics, or BHK semantics. This thesis presents the candidate's works in two directions. (1) Following Kuznets'result that self-referentiality is necessary for the realization of several modal logics including S4, we show that it is also necessary for BHK semantics. (2) We find a necessary condition for a modal theorem to require self-referentiality in its realization, and using this condition to derive many interesting properties about self-referentiality.

  • LEARNING AND PARALLELIZATION BOOST CONSTRAINT SEARCH

    Author:
    Xi Yun
    Year of Dissertation:
    2013
    Program:
    Computer Science
    Advisor:
    Susan Epstein
    Abstract:

    Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work develops convenient parallelization methods for pre-existing solvers, and thereby benefits from both state-of-the-art research and increasingly available computational resources. This work proposes and evaluates several approaches that exploit scheduling and partitioning. It formulates parallel algorithm portfolio construction as an integer- programming problem, and solves the problem with algorithms that combine case-based reasoning with either a greedy algorithm or a complete integer-programming solver. This work also develops a hybrid adaptive paradigm that learns critical information to support search and workload splitting while it solves the problem. Extensive experiments show that these approaches improve the performance of the underlying sequential solvers. The hybrid paradigm solves many difficult problems left open after recent solver competitions. Although the empirical results are mainly on constraint satisfaction problems, this work also generalizes the reasoning component of the parallel scheduler to a new, case-based paradigm, and successfully applies it to protein-ligand docking.