Alumni Dissertations and Theses

 
 

Alumni Dissertations and Theses

Filter Dissertations and Theses By:

 
 
  • PROTRU: Leveraging Provenance to Enhance Network Trust in a Wireless Sensor Network

    Author:
    Gulustan Dogan
    Year of Dissertation:
    2013
    Program:
    Computer Science
    Advisor:
    Ted Brown
    Abstract:

    Trust can be an important component of wireless sensor networks for believability of the produced data and historical value is a crucial asset in deciding trust of the data. A node's trust can change over time after its initial deployment due to various reasons such as energy loss, environmental conditions or exhausting sources. Provenance can play a significant role for supporting the calculation of information trust by recording the data flow and snapshots of the network. Furthermore provenance can be used for registering previous trust records and other information such as node type, data type, node location, average of the historical data. We will introduce a node-level trust-enhancing architecture for sensor networks using provenance. Our network will be cognitive in the sense that our system will react automatically upon detecting anomalies. Through simulations we will verify our model and will show that our approach can provide substantial enhancements in information trust as compared to the traditional network approaches.

  • Multiscale Feature Extraction and Matching with Applications to 3D Face Recognition and 2D Shape Warping

    Author:
    Hadi Fadaifard
    Year of Dissertation:
    2011
    Program:
    Computer Science
    Advisor:
    George Wolberg
    Abstract:

    Shape matching is defined as the process of computing a dissimilarity measure between shapes. Partial 3D shape matching refers to a more difficult subproblem that deals with measuring the dissimilarity between partial regions of 3D objects. Despite a great deal of attention drawn to 3D shape matching in the fields of computer vision and computer graphics, partial shape matching applied to objects of arbitrary scale remains a difficult problem. This work addresses the problem of partial 3D shape matching with no assumptions about the scale factors of the input objects. We introduce a multiscale feature extraction and matching technique that employs a new scale-space based representation for 3D surfaces. The representation is shown to be insensitive to noise, computationally efficient, and capable of automatic scale selection. Applications of the proposed representation are presented for automatic 3D surface registration, face detection, and face recognition. Test results involving two well-known 3D face datasets consisting of several thousand scanned human faces demonstrate that the performance of our recognition system is superior over competing methods. Estimating differential surface attributes, such as normals and curvatures, plays an important role in the performance of 3D matching systems. Noise in the data, however, poses the main challenge in estimating these attributes. Surface reconstruction methods, such as Moving Least Squares (MLS), help in minimizing the effects of noise. In this work, we also review the MLS approach for surface reconstruction, and show how the input noise affects the estimated differential attributes of the surface. We demonstrate how these results, together with statistical hypothesis testing, may be used to determine the smallest neighborhood size needed to estimate surface attributes. MLS reconstruction and the discrete Laplace-Beltrami operator are well-known geometric tools that have a wide range of applications. In addition to their prominent use in our 3D work, we describe a novel use of these tools in a 2D shape deformation system for retargeting garments among arbitrary poses.

  • Automatic Readability Assessment

    Author:
    Lijun Feng
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Matt Huenerfauth
    Abstract:

    We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.

  • SEARCHING FOR MOBILE DATA WITH A PRIORI STATISTICAL KNOWLEDGE OF THEIR WHEREABOUTS UNDER DELAY CONSTRAINTS

    Author:
    Yi Feng
    Year of Dissertation:
    2011
    Program:
    Computer Science
    Advisor:
    AMOTZ BAR-NOY
    Abstract:

    One or more tokens are hidden into several boxes, and then the boxes are locked. The probabilities of each token being found in each box are known. All the probabilities are independent. A searcher is looking for one, some, or all of the tokens by unlocking boxes in a predetermined number of rounds. In each round, any subset of the boxes can be unlocked and the searcher collects all tokens in them. Each box is associated with a positive unlocking cost. The goal is to minimize the expected cost of unlocking boxes until the desired tokens are found. The original motivation is to page mobile users in cellular network systems. Mobile users are tokens and cells are boxes. The probabilities of the users in cells can be extracted from historical data. The unlocking costs of boxes reflect the resources that are consumed to page a cell. The predetermined number of rounds ensures that the users will be found within a certain period of time (delay constraint). The goal is to minimize the resources that are consumed to find the users under the pre-determined delay constraint. In addition to the application of paging mobile users, this scheduling problem has broad utilization in finding information in sensor networks, searching for information in distributed data centers, medical decision making, etc. The special case in which a single token is sought and all the boxes have the same unlocking costs has been studied. Polynomial time optimal algorithms exist. Optimal search strategies can be found in a time which is quadratic with respect to the number of boxes and linear with respect to the number of rounds. We improve this time complexity to linear with respect to both the number of boxes and the number of rounds, and provide a hierarchy of algorithms that trades off optimality for complexity. In the general case of searching a single token while the boxes can have different unlocking costs, we prove it being strongly NP-hard, and provide various approximation algorithms. We also demonstrate a tradeoff between the time complexity and implementation complexity of our approximation algorithms. In the case in which we search multiple tokens and all boxes are of the same unlocking costs, we explore the conference call problem and the yellow page problem. In the former we want to find all tokens and in the later we want to find (any) one of the tokens. The conference call problem has been studied. It is NP-hard and approximation algorithms exist. We show a duality between both problems and provide efficient polynomial-time and exponential-time optimal algorithms for specific cases of the problems. We show a tradeoff between the time and space complexity of optimal algorithms. We implement all of our algorithms and some of the algorithms by other researchers. We conduct a comprehensive experimental study in the context of the paging mobile users application. The experimental study provides further insight of the behavior of algorithms and presents the performance of algorithms in real system.

  • Automatic Readability Assessment

    Author:
    Lijun Feng
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Matt Huenerfauth
    Abstract:

    We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.

  • SEARCHING FOR MOBILE DATA WITH A PRIORI STATISTICAL KNOWLEDGE OF THEIR WHEREABOUTS UNDER DELAY CONSTRAINTS

    Author:
    Yi Feng
    Year of Dissertation:
    2011
    Program:
    Computer Science
    Advisor:
    AMOTZ BAR-NOY
    Abstract:

    One or more tokens are hidden into several boxes, and then the boxes are locked. The probabilities of each token being found in each box are known. All the probabilities are independent. A searcher is looking for one, some, or all of the tokens by unlocking boxes in a predetermined number of rounds. In each round, any subset of the boxes can be unlocked and the searcher collects all tokens in them. Each box is associated with a positive unlocking cost. The goal is to minimize the expected cost of unlocking boxes until the desired tokens are found. The original motivation is to page mobile users in cellular network systems. Mobile users are tokens and cells are boxes. The probabilities of the users in cells can be extracted from historical data. The unlocking costs of boxes reflect the resources that are consumed to page a cell. The predetermined number of rounds ensures that the users will be found within a certain period of time (delay constraint). The goal is to minimize the resources that are consumed to find the users under the pre-determined delay constraint. In addition to the application of paging mobile users, this scheduling problem has broad utilization in finding information in sensor networks, searching for information in distributed data centers, medical decision making, etc. The special case in which a single token is sought and all the boxes have the same unlocking costs has been studied. Polynomial time optimal algorithms exist. Optimal search strategies can be found in a time which is quadratic with respect to the number of boxes and linear with respect to the number of rounds. We improve this time complexity to linear with respect to both the number of boxes and the number of rounds, and provide a hierarchy of algorithms that trades off optimality for complexity. In the general case of searching a single token while the boxes can have different unlocking costs, we prove it being strongly NP-hard, and provide various approximation algorithms. We also demonstrate a tradeoff between the time complexity and implementation complexity of our approximation algorithms. In the case in which we search multiple tokens and all boxes are of the same unlocking costs, we explore the conference call problem and the yellow page problem. In the former we want to find all tokens and in the later we want to find (any) one of the tokens. The conference call problem has been studied. It is NP-hard and approximation algorithms exist. We show a duality between both problems and provide efficient polynomial-time and exponential-time optimal algorithms for specific cases of the problems. We show a tradeoff between the time and space complexity of optimal algorithms. We implement all of our algorithms and some of the algorithms by other researchers. We conduct a comprehensive experimental study in the context of the paging mobile users application. The experimental study provides further insight of the behavior of algorithms and presents the performance of algorithms in real system.

  • Phylogenetic Trees and Their Analysis

    Author:
    Eric Ford
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Katherine St. John
    Abstract:

    Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant's Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with "perfect" data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data.

  • Discovering Regularity in Point Clouds of Urban Scenes

    Author:
    Sam Friedman
    Year of Dissertation:
    2014
    Program:
    Computer Science
    Advisor:
    Ioannis Stamos
    Abstract:

    Despite the apparent chaos of the urban environment, cities are actually replete with regularity. From the grid of streets laid out over the earth, to the lattice of windows thrown up into the sky, periodic regularity abounds in the urban scene. Just as salient, though less uniform, are the self-similar branching patterns of trees and vegetation that line streets and fill parks. We propose novel methods for discovering these regularities in 3D range scans acquired by a time-of-flight laser sensor. The applications of this regularity information are broad, and we present two original algorithms. The first exploits the efficiency of the Fourier transform for the real-time detection of periodicity in building facades. Periodic regularity is discovered online by doing a plane sweep across the scene and analyzing the frequency space of each column in the sweep. The simplicity and online nature of this algorithm allow it to be embedded in scanner hardware, making periodicity detection a built-in feature of future 3D cameras. We demonstrate the usefulness of periodicity in view registration, compression, segmentation, and facade reconstruction. The second algorithm leverages the hierarchical decomposition and locality in space of the wavelet transform to find stochastic parameters for procedural models that succinctly describe vegetation. These procedural models facilitate the generation of virtual worlds for architecture, gaming, and augmented reality. The self-similarity of vegetation can be inferred using multi-resolution analysis to discover the underlying branching patterns. We present a unified framework of these tools, enabling the modeling, transmission, and compression of high-resolution, accurate, and immersive 3D images.

  • Finsler Optimal Control Theory

    Author:
    Srikanth Gottipati
    Year of Dissertation:
    2009
    Program:
    Computer Science
    Advisor:
    Sergei Artemov
    Abstract:

    This thesis is a contribution to solving problems of extracting optimal controls for complex systems. Its novelty consists of a detailed examination of Finsler geometry based control by connections as proposed by Kohn and Nerode and its relation to Pontryagin's maximum principle. The long term hope is that these methods will form the underpinning of the applications of control of hybrid systems. Any advances in mathematical and algorithmic techniques for solving such problems would have wide application in business, industry, and science. The widespread use of Bellman's dynamic programming, Dantzig's linear programming, Kalman's optimization with linear quadratic cost functions, demonstrate this. But symbolic and numerical techniques historically have fallen well short of yielding efficient computation procedures to obtain near optimal controls for complex systems. Most investigations have been based on Pontryagin's geometric representation of optimal control and his maximum principle. Kohn and Nerode have proposed a different problem formulation aimed at extracting robust controls as a function of state (synthesis problem with a robustness requirement) in the context of a Finsler geometry corresponding to the optimal control problem. This leads to the study of geometric, symbolic, and numerical methods for solving geodesic equations for connections in Finsler Geometry. A principal result of this thesis is the determination of the relations between the Finsler and Pontryagin formulations of optimal control, and the transformation from one to the other. A second principal result is establishing the relations between robustness and curvature. Curvature is used to quantify the spread of geodesics due to disturbances. Finally, the thesis concludes with numerical integration schemes for computing controls and local connections.

  • Finsler Optimal Control Theory

    Author:
    Srikanth Gottipati
    Year of Dissertation:
    2009
    Program:
    Computer Science
    Advisor:
    Sergei Artemov
    Abstract:

    This thesis is a contribution to solving problems of extracting optimal controls for complex systems. Its novelty consists of a detailed examination of Finsler geometry based control by connections as proposed by Kohn and Nerode and its relation to Pontryagin's maximum principle. The long term hope is that these methods will form the underpinning of the applications of control of hybrid systems. Any advances in mathematical and algorithmic techniques for solving such problems would have wide application in business, industry, and science. The widespread use of Bellman's dynamic programming, Dantzig's linear programming, Kalman's optimization with linear quadratic cost functions, demonstrate this. But symbolic and numerical techniques historically have fallen well short of yielding efficient computation procedures to obtain near optimal controls for complex systems. Most investigations have been based on Pontryagin's geometric representation of optimal control and his maximum principle. Kohn and Nerode have proposed a different problem formulation aimed at extracting robust controls as a function of state (synthesis problem with a robustness requirement) in the context of a Finsler geometry corresponding to the optimal control problem. This leads to the study of geometric, symbolic, and numerical methods for solving geodesic equations for connections in Finsler Geometry. A principal result of this thesis is the determination of the relations between the Finsler and Pontryagin formulations of optimal control, and the transformation from one to the other. A second principal result is establishing the relations between robustness and curvature. Curvature is used to quantify the spread of geodesics due to disturbances. Finally, the thesis concludes with numerical integration schemes for computing controls and local connections.