# Alumni Dissertations and Theses

#### Filter Dissertations and Theses By:

### Developing Probabilistic Graphical Models for Identifying Text Patterns and Semantics

Author:Minhua HuangYear of Dissertation:2013Program:Computer ScienceAdvisor:Robert M. HaralickAbstract:In this dissertation, we discuss a new probabilistic graphical model called the data sequence dependence model (DSDM). This model is derived from the joint probability function for a sequence of class assignments given a sequence of input symbols (words) under fewer and different conditional independence assumptions than the commonly used probabilistic graphical models, such as hidden Markov models (HMMs), maximum entropy Markov models (MEMMs), and conditional random fields (CRFs). Our model accounts for the data sequence dependency rather than the class sequence dependency. In order to find a sequence of optimal class assignments for a sequence of input symbols, the HMM and CRF models have to employ dynamic programming. In contrast to these models, our method does not need to employ dynamic programming. Although dynamic programming is an efficient optimization technique, our model leads to an algorithm whose computational complexity is less than dynamic programming and whose performance is just as good or better. Based on DSDM, we develop algorithms for identifying semantics in texts. In this research, semantics consists of three types of text patterns. They are the semantic arguments of a verb, the sense of a polysemous word, and the noun phrases of a sentence. In addition, two other probabilistic graphical models are described. They are called the context independence model (CIM) and the class sequence dependence model (CSDM). These models have the same economic gain function as DSDM. However, they are derived under the different conditional independence assumptions. Finally, statistical testing methodologies are employed to validate these models. For our task of identifying semantic patterns, we compare each pair of the models by testing the null hypothesis that the two models are equally good at identifying semantic patterns against the alternative hypothesis that one model is better able to identify semantic patterns. The resulting p-value shows that DSDM is better able to identify semantic patterns than the other models.

### MULTI-OBJECTIVE GENETIC PROGRAMMING FOR DATA VISUALIZATION AND CLASSIFICATION

Author:Ilknur IckeYear of Dissertation:2011Program:Computer ScienceAdvisor:Andrew RosenbergAbstract:The process of knowledge discovery lies on a continuum ranging between the human driven (manual exploration) approaches to fully automatic data mining methods. As a hybrid approach, the emerging field of visual analytics aims to facilitate human-machine collaborative decision making by providing automated analysis of data via interactive visualizations. One area of interest in visual analytics is to develop data transformation methods that support visualization and analysis. In this thesis, we develop an evolutionary computing based multi-objective dimensionality reduction method for visual data classification. The algorithm is called Genetic Programming Projection Pursuit (G3P) where genetic programming is utilized in order to automatically create visualizations of higher dimensional labeled datasets which are assessed in terms of discriminative power and interpretability. We consider two forms of interpretability of the visualizations: clearly separated and compact class structures along with easily interpretable data transformation expressions relating the original data attributes to the visualization axes. The G3P algorithm incorporates a number of automated measures of interpretability that were found to be in alignment with human judgement through a user study we conducted. On a number of data mining problems, we show that G3P generates a large number of data transformations that are better than those generated by a number of dimensionality reduction methods such as the principal components analysis (PCA), multiple discriminants analysis (MDA) and targeted projection pursuit (TPP) in terms of discriminative power and interpretability.

### Optimizing Resource Allocation Under Constraints

Author:Matthew JohnsonYear of Dissertation:2010Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:The fundamental concept of resource allocation--assigning resources to tasks, and so on--gives rise to very broad families of problems, with instances in many domains, applications, disciplines. In the proposed thesis, we will study a series of assignment-oriented resource allocation problems modeling real-world situations, from the point of view of online and approximation algorithms. The complexities and character of these problems vary greatly. Separate from the problem definitions, we will also examine how the difficulty changes as we vary modalities such as online v. offline, distributed v. centralized, linear v. poly-time, thus mapping out a lattice of problem settings. For each problem and setting, we seek efficient algorithms of the appropriate kind (e.g., exact, approximation, competitive, distributed, ...). Three classes of problems examined in particular will be sensor-mission matching, battery charge scheduling, and geometric sensor coverage.

### Optimizing Resource Allocation Under Constraints

Author:Matthew JohnsonYear of Dissertation:2010Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:The fundamental concept of resource allocation--assigning resources to tasks, and so on--gives rise to very broad families of problems, with instances in many domains, applications, disciplines. In the proposed thesis, we will study a series of assignment-oriented resource allocation problems modeling real-world situations, from the point of view of online and approximation algorithms. The complexities and character of these problems vary greatly. Separate from the problem definitions, we will also examine how the difficulty changes as we vary modalities such as online v. offline, distributed v. centralized, linear v. poly-time, thus mapping out a lattice of problem settings. For each problem and setting, we seek efficient algorithms of the appropriate kind (e.g., exact, approximation, competitive, distributed, ...). Three classes of problems examined in particular will be sensor-mission matching, battery charge scheduling, and geometric sensor coverage.

### 3D Map Building of Structured Indoor Environments with Heterogeneous Robots

Author:Ravi KaushikYear of Dissertation:2011Program:Computer ScienceAdvisor:Jizhong XiaoAbstract:We explore a methodology to construct 3D maps of indoor environments using multiple heterogeneous robots. The robots acquire overlapping 3D range images at contiguous locations and fuse them together to form a complete range map. Our methodology involves three major steps in map construction: the first step is to estimate an initial pose between the multiple robots using a real-time camera pose estimation algorithm, the second step involves extracting polygon features from overlapping range images and the third step involves registration of the overlapping range images using extracted polygonal features. We evaluate the performance of several approaches to camera pose estimation, which is used to determine the initial estimate of pose between the multiple robots. We introduce a fast and robust segmentation algorithm, which extracts the polygonal features from the 3D range images. A polygon-based registration algorithm accurately fuses the overlapping range images. The algorithm further refines the initial pose estimate by minimizing the geometric distance between corresponding polygonal features extracted from overlapping range images. Our polygon-based registration combined with camera pose estimation executes in real-time and much faster compared to point-based scan registration techniques. We present experimental results obtained while building indoor maps and quantitative analysis of the algorithms used in map construction of indoor environments.

### THEORETICAL METHODS FOR BLUR-CORRECTION IN ELECTRON AND SOFT X-RAY MICROSCOPY

Author:Joanna KlukowskaYear of Dissertation:2013Program:Computer ScienceAdvisor:Gabor HermanAbstract:The process of object reconstruction from projections is widely used in many fields. One of the applications is the reconstruction of biological specimens from two-dimensional projections in transmission electron microscopy and transmission x-ray microscopy. Various methods have been developed for correcting the blurring that occurs when the projections are obtained by a real instrument. As the attainable resolution increases, new issues become apparent and need to be taken into account in the imaging model. In this dissertation we concentrate on the point spread function and its impact on the quality and usefulness of the reconstructions from images obtained according to image formation models for the two types of microscopy. Specifically, we consider spatial variance of the point spread function in the direction of the electron or x-ray propagation. The correction methods for this type of blurring in both cases are related, but not identical due to nature of the respective imaging processes. We propose correction methods and demonstrate their efficacy using computer simulations.

### Comparing AI Search Algorithms and Their Efficiency When Applied to Path Finding Problems

Author:Erdal KoseYear of Dissertation:2012Program:Computer ScienceAdvisor:Danny KopecAbstract:Various Artificial Intelligence (AI) search algorithms have been investigated and classified as unidirectional or bidirectional. We start our discussion by presenting certain unidirectional and bidirectional search algorithms (BDS). We continued our study by presenting contributions to the field of search algorithms in AI. The focus of this research is the study of the bidirectional search and certain classes of AI problems and some new approaches to the domain of AI search algorithms have been explored. The second contribution of this research is to compare problem representations and to exploit built-in features of diverse programming paradigms. The programing paradigms have been classified by the problem domains which they might be more suitable for. This has been justified by the fact that the evaluation of search algorithms is a well-studied area of Artificial Intelligence (AI). However evaluation of the performance of programming paradigms when applied to search algorithms has not been studied well. We conclude our discussion with experimental results and more detailed information about our implementations.

### Game-theoretic and Bio-inspired Techniques for Self-positioning Autonomous Mobile Nodes

Author:Janusz KusykYear of Dissertation:2012Program:Computer ScienceAdvisor:Umit UyarAbstract:Autonomous mobile nodes that position themselves over an unknown terrain can ameliorate many of the problems which mobile ad hoc networks (MANETs) face. Achieving good spatial placement of mobile agents leads to superior network topology with improved area coverage, reduced power consumption, enhanced spectrum utilization, and simplified routing procedures. However, autonomous decision-making by nodes may also increase uncooperative and selfish behavior by these independent agents. Since it is impractical in MANETs to sustain complete and accurate information at each node about the entire network layout and decision of an individual node about its position should only be based on local information with limited coordination among agents. These characteristics recommend game theory (GT) as a tool for modeling, analyzing, and designing many MANET applications. At the same time, biologically inspired computation techniques such as genetic algorithms (GAs) can be used for finding desirable solutions in a prohibitively large search space and, in the case of MANETs, reduce the computational complexity needed for a node to determine its next location. We introduce several novel approaches for autonomous MANET nodes to distribute themselves uniformly over a dynamically changing environment without a centralized controller or a priori information about the deployment terrain or the state of other mobile agents. Our methods combine concepts from traditional GT, evolutionary game theory, and bio-inspired algorithms to effectively and efficiently guide autonomous MANET nodes in finding the best positions. We show that myopic actions of each individual node lead the entire network towards a stable and uniform distribution. We present formal analysis of our methods, and prove their important properties including convergence, area coverage, and uniformity characteristics. We developed a Java-based modeling platform to simulate performances of mobile networks, which allows for real-time visualization of ongoing network dynamics and collection of all critical data for evaluation purposes. Our analysis and experimental results demonstrate that GT and GA can be successfully combined for autonomous node placement to provide useful and resilient methods for optimizing network topology.

### Structural parameterizations of hard graph problems

Author:Michail LampisYear of Dissertation:2011Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:In traditional computational complexity we measure algorithm running times as functions of one variable, the size of the input. Though in this setting our goal is usually to design polynomial-time algorithms, most interesting graph problems are unfortunately believed to require exponential time. Parameterized complexity theory refines this by introducing a second variable, called the parameter, which is supposed to quantify the "hardness" of each specific instance. The goal now becomes to confine the combinatorial explosion to the parameter, by designing an algorithm that runs in time polynomial in the size of the input, though inevitably exponential in the parameter. This will allow us to tackle instances where the parameter value is much more modest than the input size, which will happen often if the parameter is chosen well. In this work we deal with parameterized versions of intractable graph problems. Our focus will be structural parameterizations, meaning that the parameters we choose will be measures which attempt to quantify the complexity of a graph. The most important such measure in the literature is treewidth, a notion which quantifies how "tree-like" a graph is. Here, we will consider parameterizations by treewidth, but also by alternative measures of graph complexity such as vertex cover, or maximum degree. Our results move in two directions: First, we study the computational complexity of structural parameterizations of several specific problems. In this vein we show hardness results for Maximum Path Coloring parameterized by measures such as treewidth and maximum degree; we present algorithms for two parameterizations of Vertex Cover; and we present algorithms and hardness results for three different parameterizations of the modal satisfiability problem. Second, we study the algorithmic properties of structural parameters not with respect to a specific problem, but with respect to whole classes of problems. For undirected graphs we present algorithmic meta-theorems which show that large classes of problems are tractable when parameterized by vertex cover or max-leaf number. Our results strengthen a famous theorem due to Courcelle, for these special cases. For directed graphs we give hardness results for Hamiltonian Circuit and Max Di Cut which apply to essentially all the definitions of directed treewidth which have appeared in the literature, showing that none of them is as algorithmically potent as (undirected) treewidth.

### Ultrafast Pseudorandom Number Generation Using Pseudorandom Permutations and Mappings

Author:Jie LiYear of Dissertation:2013Program:Computer ScienceAdvisor:Paula WhitlockAbstract:Pseudorandom numbers have broad applications in science, technology, entertainment, etc. So far many pseudorandom number generators (PRNGs) have been developed, but dedicated high performance high quality PRNGs are still in demand. In light of this, we propose a new design approach which combines byte-oriented pseudorandom permutations and integer-oriented pseudorandom mappings. Pseudorandom permutations are used for state initialization and reseeding. Pseudorandom mappings are used for state transition and pseudorandom number generation. Several PRNGs are designed using this approach. The performance tests show they surpass the existing pseudorandom number generators in both non-cryptographic category and cryptographic category. The proposed non-cryptographic PRNG reaches a generation speed of half clock cycle per byte on an Intel Core i3 processor, and the cryptographically secure PRNG also runs into one clock cycle per byte. They demonstrate excellent randomness properties as attested by the NIST statistical tests, the new Diehard battery of tests, and the TestU01 batteries of tests. The non-cryptographic PRNG is also designed to meet a couple of other requirements, including long period, high-dimensional equidistribution, quick recovery from biased states, and ease of use. For the cryptographically secure PRNG, security has been taken into account throughout the design. Besides the key scheduling algorithm, which has an avalanche effect comparable to that of standard hash functions, a new two-layer design paradigm is adopted, which functionally divides the internal state into two parts, with the first part serving as a source of entropy and periodically reseeding the second part. The generator has a huge internal state and employs a high quality state update function, which renders a very long expected period. The overall security of the generator is carefully analyzed in the context of various known cryptanalytic attacks, state compromise extension attacks, and next-bit test. Besides deterministic pseudorandom number generation, the proposed PRNGs can also work in a non-deterministic mode. In this mode, the generators behave like a true random number generator by periodically querying some non-deterministic random sources and using them as unpredictable sources of entropies. Running in this mode has virtually no impact on the cost, performance, availability, or usability of the generators.