# Alumni Dissertations

#### Filter Dissertations By:

### Phylogenetic Trees and Their Analysis

Author:Eric FordYear of Dissertation:2014Program:Computer ScienceAdvisor:Katherine St. JohnAbstract:Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant's Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with "perfect" data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data.

### Finsler Optimal Control Theory

Author:Srikanth GottipatiYear of Dissertation:2009Program:Computer ScienceAdvisor:Sergei ArtemovAbstract:This thesis is a contribution to solving problems of extracting optimal controls for complex systems. Its novelty consists of a detailed examination of Finsler geometry based control by connections as proposed by Kohn and Nerode and its relation to Pontryagin's maximum principle. The long term hope is that these methods will form the underpinning of the applications of control of hybrid systems. Any advances in mathematical and algorithmic techniques for solving such problems would have wide application in business, industry, and science. The widespread use of Bellman's dynamic programming, Dantzig's linear programming, Kalman's optimization with linear quadratic cost functions, demonstrate this. But symbolic and numerical techniques historically have fallen well short of yielding efficient computation procedures to obtain near optimal controls for complex systems. Most investigations have been based on Pontryagin's geometric representation of optimal control and his maximum principle. Kohn and Nerode have proposed a different problem formulation aimed at extracting robust controls as a function of state (synthesis problem with a robustness requirement) in the context of a Finsler geometry corresponding to the optimal control problem. This leads to the study of geometric, symbolic, and numerical methods for solving geodesic equations for connections in Finsler Geometry. A principal result of this thesis is the determination of the relations between the Finsler and Pontryagin formulations of optimal control, and the transformation from one to the other. A second principal result is establishing the relations between robustness and curvature. Curvature is used to quantify the spread of geodesics due to disturbances. Finally, the thesis concludes with numerical integration schemes for computing controls and local connections.

### Developing Probabilistic Graphical Models for Identifying Text Patterns and Semantics

Author:Minhua HuangYear of Dissertation:2013Program:Computer ScienceAdvisor:Robert M. HaralickAbstract:In this dissertation, we discuss a new probabilistic graphical model called the data sequence dependence model (DSDM). This model is derived from the joint probability function for a sequence of class assignments given a sequence of input symbols (words) under fewer and different conditional independence assumptions than the commonly used probabilistic graphical models, such as hidden Markov models (HMMs), maximum entropy Markov models (MEMMs), and conditional random fields (CRFs). Our model accounts for the data sequence dependency rather than the class sequence dependency. In order to find a sequence of optimal class assignments for a sequence of input symbols, the HMM and CRF models have to employ dynamic programming. In contrast to these models, our method does not need to employ dynamic programming. Although dynamic programming is an efficient optimization technique, our model leads to an algorithm whose computational complexity is less than dynamic programming and whose performance is just as good or better. Based on DSDM, we develop algorithms for identifying semantics in texts. In this research, semantics consists of three types of text patterns. They are the semantic arguments of a verb, the sense of a polysemous word, and the noun phrases of a sentence. In addition, two other probabilistic graphical models are described. They are called the context independence model (CIM) and the class sequence dependence model (CSDM). These models have the same economic gain function as DSDM. However, they are derived under the different conditional independence assumptions. Finally, statistical testing methodologies are employed to validate these models. For our task of identifying semantic patterns, we compare each pair of the models by testing the null hypothesis that the two models are equally good at identifying semantic patterns against the alternative hypothesis that one model is better able to identify semantic patterns. The resulting p-value shows that DSDM is better able to identify semantic patterns than the other models.

### MULTI-OBJECTIVE GENETIC PROGRAMMING FOR DATA VISUALIZATION AND CLASSIFICATION

Author:Ilknur IckeYear of Dissertation:2011Program:Computer ScienceAdvisor:Andrew RosenbergAbstract:The process of knowledge discovery lies on a continuum ranging between the human driven (manual exploration) approaches to fully automatic data mining methods. As a hybrid approach, the emerging field of visual analytics aims to facilitate human-machine collaborative decision making by providing automated analysis of data via interactive visualizations. One area of interest in visual analytics is to develop data transformation methods that support visualization and analysis. In this thesis, we develop an evolutionary computing based multi-objective dimensionality reduction method for visual data classification. The algorithm is called Genetic Programming Projection Pursuit (G3P) where genetic programming is utilized in order to automatically create visualizations of higher dimensional labeled datasets which are assessed in terms of discriminative power and interpretability. We consider two forms of interpretability of the visualizations: clearly separated and compact class structures along with easily interpretable data transformation expressions relating the original data attributes to the visualization axes. The G3P algorithm incorporates a number of automated measures of interpretability that were found to be in alignment with human judgement through a user study we conducted. On a number of data mining problems, we show that G3P generates a large number of data transformations that are better than those generated by a number of dimensionality reduction methods such as the principal components analysis (PCA), multiple discriminants analysis (MDA) and targeted projection pursuit (TPP) in terms of discriminative power and interpretability.

### Optimizing Resource Allocation Under Constraints

Author:Matthew JohnsonYear of Dissertation:2010Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:The fundamental concept of resource allocation--assigning resources to tasks, and so on--gives rise to very broad families of problems, with instances in many domains, applications, disciplines. In the proposed thesis, we will study a series of assignment-oriented resource allocation problems modeling real-world situations, from the point of view of online and approximation algorithms. The complexities and character of these problems vary greatly. Separate from the problem definitions, we will also examine how the difficulty changes as we vary modalities such as online v. offline, distributed v. centralized, linear v. poly-time, thus mapping out a lattice of problem settings. For each problem and setting, we seek efficient algorithms of the appropriate kind (e.g., exact, approximation, competitive, distributed, ...). Three classes of problems examined in particular will be sensor-mission matching, battery charge scheduling, and geometric sensor coverage.

### 3D Map Building of Structured Indoor Environments with Heterogeneous Robots

Author:Ravi KaushikYear of Dissertation:2011Program:Computer ScienceAdvisor:Jizhong XiaoAbstract:We explore a methodology to construct 3D maps of indoor environments using multiple heterogeneous robots. The robots acquire overlapping 3D range images at contiguous locations and fuse them together to form a complete range map. Our methodology involves three major steps in map construction: the first step is to estimate an initial pose between the multiple robots using a real-time camera pose estimation algorithm, the second step involves extracting polygon features from overlapping range images and the third step involves registration of the overlapping range images using extracted polygonal features. We evaluate the performance of several approaches to camera pose estimation, which is used to determine the initial estimate of pose between the multiple robots. We introduce a fast and robust segmentation algorithm, which extracts the polygonal features from the 3D range images. A polygon-based registration algorithm accurately fuses the overlapping range images. The algorithm further refines the initial pose estimate by minimizing the geometric distance between corresponding polygonal features extracted from overlapping range images. Our polygon-based registration combined with camera pose estimation executes in real-time and much faster compared to point-based scan registration techniques. We present experimental results obtained while building indoor maps and quantitative analysis of the algorithms used in map construction of indoor environments.

### THEORETICAL METHODS FOR BLUR-CORRECTION IN ELECTRON AND SOFT X-RAY MICROSCOPY

Author:Joanna KlukowskaYear of Dissertation:2013Program:Computer ScienceAdvisor:Gabor HermanAbstract:The process of object reconstruction from projections is widely used in many fields. One of the applications is the reconstruction of biological specimens from two-dimensional projections in transmission electron microscopy and transmission x-ray microscopy. Various methods have been developed for correcting the blurring that occurs when the projections are obtained by a real instrument. As the attainable resolution increases, new issues become apparent and need to be taken into account in the imaging model. In this dissertation we concentrate on the point spread function and its impact on the quality and usefulness of the reconstructions from images obtained according to image formation models for the two types of microscopy. Specifically, we consider spatial variance of the point spread function in the direction of the electron or x-ray propagation. The correction methods for this type of blurring in both cases are related, but not identical due to nature of the respective imaging processes. We propose correction methods and demonstrate their efficacy using computer simulations.

### Comparing AI Search Algorithms and Their Efficiency When Applied to Path Finding Problems

Author:Erdal KoseYear of Dissertation:2012Program:Computer ScienceAdvisor:Danny KopecAbstract:Various Artificial Intelligence (AI) search algorithms have been investigated and classified as unidirectional or bidirectional. We start our discussion by presenting certain unidirectional and bidirectional search algorithms (BDS). We continued our study by presenting contributions to the field of search algorithms in AI. The focus of this research is the study of the bidirectional search and certain classes of AI problems and some new approaches to the domain of AI search algorithms have been explored. The second contribution of this research is to compare problem representations and to exploit built-in features of diverse programming paradigms. The programing paradigms have been classified by the problem domains which they might be more suitable for. This has been justified by the fact that the evaluation of search algorithms is a well-studied area of Artificial Intelligence (AI). However evaluation of the performance of programming paradigms when applied to search algorithms has not been studied well. We conclude our discussion with experimental results and more detailed information about our implementations.

### Game-theoretic and Bio-inspired Techniques for Self-positioning Autonomous Mobile Nodes

Author:Janusz KusykYear of Dissertation:2012Program:Computer ScienceAdvisor:Umit UyarAbstract:Autonomous mobile nodes that position themselves over an unknown terrain can ameliorate many of the problems which mobile ad hoc networks (MANETs) face. Achieving good spatial placement of mobile agents leads to superior network topology with improved area coverage, reduced power consumption, enhanced spectrum utilization, and simplified routing procedures. However, autonomous decision-making by nodes may also increase uncooperative and selfish behavior by these independent agents. Since it is impractical in MANETs to sustain complete and accurate information at each node about the entire network layout and decision of an individual node about its position should only be based on local information with limited coordination among agents. These characteristics recommend game theory (GT) as a tool for modeling, analyzing, and designing many MANET applications. At the same time, biologically inspired computation techniques such as genetic algorithms (GAs) can be used for finding desirable solutions in a prohibitively large search space and, in the case of MANETs, reduce the computational complexity needed for a node to determine its next location. We introduce several novel approaches for autonomous MANET nodes to distribute themselves uniformly over a dynamically changing environment without a centralized controller or a priori information about the deployment terrain or the state of other mobile agents. Our methods combine concepts from traditional GT, evolutionary game theory, and bio-inspired algorithms to effectively and efficiently guide autonomous MANET nodes in finding the best positions. We show that myopic actions of each individual node lead the entire network towards a stable and uniform distribution. We present formal analysis of our methods, and prove their important properties including convergence, area coverage, and uniformity characteristics. We developed a Java-based modeling platform to simulate performances of mobile networks, which allows for real-time visualization of ongoing network dynamics and collection of all critical data for evaluation purposes. Our analysis and experimental results demonstrate that GT and GA can be successfully combined for autonomous node placement to provide useful and resilient methods for optimizing network topology.

### Structural parameterizations of hard graph problems

Author:Michail LampisYear of Dissertation:2011Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:In traditional computational complexity we measure algorithm running times as functions of one variable, the size of the input. Though in this setting our goal is usually to design polynomial-time algorithms, most interesting graph problems are unfortunately believed to require exponential time. Parameterized complexity theory refines this by introducing a second variable, called the parameter, which is supposed to quantify the "hardness" of each specific instance. The goal now becomes to confine the combinatorial explosion to the parameter, by designing an algorithm that runs in time polynomial in the size of the input, though inevitably exponential in the parameter. This will allow us to tackle instances where the parameter value is much more modest than the input size, which will happen often if the parameter is chosen well. In this work we deal with parameterized versions of intractable graph problems. Our focus will be structural parameterizations, meaning that the parameters we choose will be measures which attempt to quantify the complexity of a graph. The most important such measure in the literature is treewidth, a notion which quantifies how "tree-like" a graph is. Here, we will consider parameterizations by treewidth, but also by alternative measures of graph complexity such as vertex cover, or maximum degree. Our results move in two directions: First, we study the computational complexity of structural parameterizations of several specific problems. In this vein we show hardness results for Maximum Path Coloring parameterized by measures such as treewidth and maximum degree; we present algorithms for two parameterizations of Vertex Cover; and we present algorithms and hardness results for three different parameterizations of the modal satisfiability problem. Second, we study the algorithmic properties of structural parameters not with respect to a specific problem, but with respect to whole classes of problems. For undirected graphs we present algorithmic meta-theorems which show that large classes of problems are tractable when parameterized by vertex cover or max-leaf number. Our results strengthen a famous theorem due to Courcelle, for these special cases. For directed graphs we give hardness results for Hamiltonian Circuit and Max Di Cut which apply to essentially all the definitions of directed treewidth which have appeared in the literature, showing that none of them is as algorithmically potent as (undirected) treewidth.