# Alumni Dissertations

#### Filter Dissertations By:

### Optimization Problems in Sensor Network Data Collection

Author:Simon ShamounYear of Dissertation:2011Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:Data collection is one of the most important tasks of many sensor networks. The data collected by sensors is used to monitor and analyze various systems, such as volcanoes, forests, and bridges. Large scale wireless sensor networks can provide timely access to a wealth of data, but obtaining this data is challenged by various resource constraints. This thesis proposes and analyzes solutions to three optimization problems that arise from the conflict between data collection and resource constraints: (1) maximize coverage by a set of sensors when the coverage they provide varies with location; (2) select a subset of the sensors, within some budget constraint, that best predict the data streams produced by all the sensors in the network; and (3) minimize the cost needed to find the top ranking sensor readings according to some criteria. The analyses of these problems use three different views of a sensor network: a coverage-centric view, in which each sensor is valued for its coverage ability; a data-centric view, in which each sensor is valued for the data it provides; and an agent-centric view, in which each sensor is viewed as an independent agent with information of value to the application. By choosing an appropriate view of the network, it is possible to separate the analysis from implementation details and apply well-established techniques from other domains to the problem solution. In this case, methodologies from stochastic and computational geometry, graph theory, and search theory are applied to the respective problems. This thesis presents optimal solutions to the coverage and search problems, approximation bounds on the best possible solution to the selection problem, and quantitative comparisons to alternative solutions to each problem in synthetic environments.

### Optimization Problems in Sensor Network Data Collection

Author:Simon ShamounYear of Dissertation:2011Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:Data collection is one of the most important tasks of many sensor networks. The data collected by sensors is used to monitor and analyze various systems, such as volcanoes, forests, and bridges. Large scale wireless sensor networks can provide timely access to a wealth of data, but obtaining this data is challenged by various resource constraints. This thesis proposes and analyzes solutions to three optimization problems that arise from the conflict between data collection and resource constraints: (1) maximize coverage by a set of sensors when the coverage they provide varies with location; (2) select a subset of the sensors, within some budget constraint, that best predict the data streams produced by all the sensors in the network; and (3) minimize the cost needed to find the top ranking sensor readings according to some criteria. The analyses of these problems use three different views of a sensor network: a coverage-centric view, in which each sensor is valued for its coverage ability; a data-centric view, in which each sensor is valued for the data it provides; and an agent-centric view, in which each sensor is viewed as an independent agent with information of value to the application. By choosing an appropriate view of the network, it is possible to separate the analysis from implementation details and apply well-established techniques from other domains to the problem solution. In this case, methodologies from stochastic and computational geometry, graph theory, and search theory are applied to the respective problems. This thesis presents optimal solutions to the coverage and search problems, approximation bounds on the best possible solution to the selection problem, and quantitative comparisons to alternative solutions to each problem in synthetic environments.

### ON THE PROBLEM OF PACKING STEINER TREES OF A GRAPH

Author:MOHAMMAD TALAFHAYear of Dissertation:2010Program:Computer ScienceAdvisor:LOUIS PETINGIAbstract:On The Problem of Packing Steiner Trees of a Graph Let G = (V, E) be an undirected graph with vertex-set V, edge-set E, and a subset of distinguished terminal vertices K of V, where |K| ≥ 2 (the vertices in V - K are called the non-terminal vertices). The degree of a vertex v of G is the number of edges incident at v in G. A K-Steiner tree T of G is a tree containing the terminal vertex-set K, and any vertex of degree one in T must belong to K. The K-edge-connectivity, denoted as λ

_{K}(G), of a connected graph G with terminal vertex-set K, is the minimum number of edges whose removal from G disconnects at least two vertices of K. The Steiner Tree Packing problem (STPP for short) is the problem of finding the maximum number of edge-disjoint K-Steiner trees, t_{K}(G), contained in G. Specifically, in this thesis, we are interested in finding a lower bound on t_{K}(G) with respect to the K-edge-connectivity, λ_{K}(G). The Steiner Tree Packing problem has attracted considerable attention from many researchers in different areas because of its wide applicability. It has applications in routing problems arising in VLSI circuit design, where an effective way of sharing different signals among cells in a circuit can be achieved by the use of edge-disjoint Steiner trees. It also has a variety of computer network applications such as multicasting, video-conferencing, and network information flow where simultaneous communications can be facilitated by using edge-disjoint Steiner trees. In 2003, Kriesell conjectured that any graph G = (V, E) with terminal vertex-set with subset of terminal vertices K of V has at least λ_{K}(G) / 2 edge-disjoint K-Steiner trees. In this thesis we give a background on this problem and we present most of the major results obtained so far. Moreover, we show that Kriesell's conjecture can be answered affirmatively if the edges of G can be partitioned into K-Steiner trees. This result yields bounds for the problem of packing K-Steiner trees with certain intersection properties in a graph. In addition we show that for any graph G with terminal vertex-set K, ; t_{K}(G) ≥ λ_{K}(G) - |V - K| / 2 - 1. Moreover, we study a vulnerability index called the edge-toughness, which connects the connectivity of a network with the problem of packing Steiner trees of a graph.### ON THE PROBLEM OF PACKING STEINER TREES OF A GRAPH

Author:MOHAMMAD TALAFHAYear of Dissertation:2010Program:Computer ScienceAdvisor:LOUIS PETINGIAbstract:On The Problem of Packing Steiner Trees of a Graph Let G = (V, E) be an undirected graph with vertex-set V, edge-set E, and a subset of distinguished terminal vertices K of V, where |K| ≥ 2 (the vertices in V - K are called the non-terminal vertices). The degree of a vertex v of G is the number of edges incident at v in G. A K-Steiner tree T of G is a tree containing the terminal vertex-set K, and any vertex of degree one in T must belong to K. The K-edge-connectivity, denoted as λ

_{K}(G), of a connected graph G with terminal vertex-set K, is the minimum number of edges whose removal from G disconnects at least two vertices of K. The Steiner Tree Packing problem (STPP for short) is the problem of finding the maximum number of edge-disjoint K-Steiner trees, t_{K}(G), contained in G. Specifically, in this thesis, we are interested in finding a lower bound on t_{K}(G) with respect to the K-edge-connectivity, λ_{K}(G). The Steiner Tree Packing problem has attracted considerable attention from many researchers in different areas because of its wide applicability. It has applications in routing problems arising in VLSI circuit design, where an effective way of sharing different signals among cells in a circuit can be achieved by the use of edge-disjoint Steiner trees. It also has a variety of computer network applications such as multicasting, video-conferencing, and network information flow where simultaneous communications can be facilitated by using edge-disjoint Steiner trees. In 2003, Kriesell conjectured that any graph G = (V, E) with terminal vertex-set with subset of terminal vertices K of V has at least λ_{K}(G) / 2 edge-disjoint K-Steiner trees. In this thesis we give a background on this problem and we present most of the major results obtained so far. Moreover, we show that Kriesell's conjecture can be answered affirmatively if the edges of G can be partitioned into K-Steiner trees. This result yields bounds for the problem of packing K-Steiner trees with certain intersection properties in a graph. In addition we show that for any graph G with terminal vertex-set K, ; t_{K}(G) ≥ λ_{K}(G) - |V - K| / 2 - 1. Moreover, we study a vulnerability index called the edge-toughness, which connects the connectivity of a network with the problem of packing Steiner trees of a graph.### SEMIPARAMETRIC TEMPORAL CLUSTERING

Author:Suzanne TamangYear of Dissertation:2013Program:Computer ScienceAdvisor:Simon ParsonsAbstract:Although temporal data provides critical context for many real-world reasoning tasks, incorporating the temporal dimension into an analysis can present methodological challenges. Traditional methods from statistics are limited in their ability to process noisy, large-scale secondary data sources. Data mining approaches are better suited for these types of problems, but have primarily focused on static data sets. However, few real-world data sets are static, or measure stationary phenomena; rather, they are dynamic. To facilitate the meaningful use of abundant, unlabeled temporal data, I develop a new temporal clustering method that can assist in the preprocessing, exploration, and discovery of new knowledge from secondary data sources that are subject to arbitrary sampling schemes, and contain observation sequences of different durations. My approach builds on the semiparametric time series clustering framework, which has demonstrated clear benefits over fully parametric, or fully non-parametric methods. The framework combines beneficial parametric assumptions, such as the Markov or hidden-state assumption, to model temporal systems, with a more agnostic, nonparametric approach for clustering the embedded models. Using digital health data as a case study, I broaden the range of scenarios for which semiparametric clustering can be successfully applied. Specifically, I develop a method to use a state-of-the-art continuous-time Bayesian network to more naturally represent temporal information, addressing limitations of discrete- time methods. Also, as an alternative to spectral methods I pair model-based abstraction with a nonparametric Bayesian clustering technique that allows k to be expressed as a function of the size and complexity of the patient population, avoiding the requirement to prespecify the number of clusters using a heuristic. To demonstrate the ability of this approach to produce meaningful results, clusters are evaluated using intrinsic and extrinsic validation. In addition, I compare cluster assignments with that of temporal clustering systems reported in the research literature, showing a 20% relative improvement over the best system's performance and recognizable differences among the patient clusters that are detected.

### A Symbolic Exploration of the Joint State Space and the Underlying Argumentation-based Reasoning Processes for Multiagent Planning

Author:Yuqing TangYear of Dissertation:2012Program:Computer ScienceAdvisor:Simon ParsonsAbstract:Coming up with coherent behaviors for a team of agents in a non-deterministic environment is a complex process. The problem is further complicated by information regarding the environment being defeasible --- new information will disqualify the old information --- while at the same time this information is distributed, uncertain and possibly inconsistent. In this dissertation, I propose an approach based on symbolic model checking techniques to implicitly explore the state space of a system of agents to form a coherent set of joint behaviors for the agents with uncertainty captured as non-deterministism in the state space. The exploration process is based on individual agents' defeasible factored information about the dynamic of the state space towards a set of possibly inconsistent goals. This process can be interpreted using an argumentation-based approach. The exploration of the state space and the argumentation-based reasoning processes are carried out by a sequence of logical operations in the logic of Quantified Boolean Formulae (QBFs) and its model representation --- Binary Decision Diagrams (BDDs). All the algorithms to carry out these processes can be computed with a polynomial number of QBF operations in terms of the number of inputs and the number of boolean variables in the domain. Although general symbolic solving QBFs and BDDs is PSPACE complete, the practices of QBF solvers and BDD implementations allow us to employ various techniques and application-specific heuristics to solve a number of applications. In summary, the approaches proposed in this dissertation provide a formal system and implementation techniques to integrate non-monotonic reasoning and non-deterministic planning into defeasible multiagent decentralized planning so that coherent behaviors can be formed for a system of agents in a very demanding setting where the information regarding the environment and goals regarding what to do are distributed, incomplete, possibly inconsistent and uncertain.

### 3D SCENE MODELING AND UNDERSTANDING FROM IMAGE SEQUENCES

Author:Hao TangYear of Dissertation:2013Program:Computer ScienceAdvisor:Zhigang ZhuAbstract:A new method for 3D modeling is proposed, which generates a content-based 3D mosaic (CB3M) representation for long video sequences of 3D, dynamic urban scenes captured by a camera on a mobile platform. In the first phase, a set of parallel-perspective (pushbroom) mosaics with varying viewing directions is generated to capture both the 3D and dynamic aspects of the scene under the camera coverage. In the second phase, a unified patch-based stereo matching algorithm is applied to extract parametric representations of the color, structure and motion of the dynamic and/or 3D objects in urban scenes, where a lot of planar surfaces exist. Multiple pairs of stereo mosaics are used for facilitating reliable stereo matching, occlusion handling, accurate 3D reconstruction and robust moving target detection. The outcome of this phase is a CB3M representation, which is a highly compressed visual representation for a dynamic 3D scene, and has object contents of both 3D and motion information. In the third phase, a multi-layer based scene understanding algorithm is proposed, resulting in a planar surface model for higher-level object representations. Experimental results are given for both simulated and several different real video sequences of large-scale 3D scenes to show the accuracy and effectiveness of the representation. We also show the patch-based stereo matching algorithm and the CB3M representation can be generalized to 3D modeling with perspective views using either a single camera or a stereovision head on a ground mobile platform or a pedestrian. Applications of the proposed method include airborne or ground video surveillance, 3D urban scene modeling, traffic survey, transportation planning and the visual aid for perception and navigation of blind people.

### Efficient Communication Through Structured Node Labeling in Peer-to-Peer Networks

Author:Andi ToceYear of Dissertation:2013Program:Computer ScienceAdvisor:Abbe MowshowitzAbstract:Peer to Peer (P2P) networks have become increasingly popular in recent years as administrators are choosing to move away from monolithic centralized networks. These networks, while offering significant advantages in a variety of applications, are presented with a new set of challenges, from communication efficiency to network vulnerability and security. Numerous researchers have proposed a variety of solutions to improve the quality of service in P2P networks. A critical factor in maintaining an acceptable level of service quality is the efficiency and reliability of communication among nodes. One way of improving communication is to assign identifiers (labels) to each node and then use these labels to facilitate and improve message routing. This proposed work introduces two labeling schemata for P2P networks. Each participating node is assigned a label set. Labels are then used to determine node positions within an engineered logical overlay and identify routing paths during communication. We prove that the assignment of labels reduces the overall cost of communication thus saving valuable network resources. These theoretical findings are confirmed by experimental results with randomly selected P2P networks of various sizes. Detailed statistics on the performance of each protocol are provided which show clearly the practical utility of the labeling approach.

### AUTOMATED LEARNER CLASSIFICATION THROUGH INTERFACE EVENT STREAM AND SUMMARY STATISTICS ANALYSIS

Author:Edgar TroudtYear of Dissertation:2014Program:Computer ScienceAdvisor:Danny KopecAbstract:Reading comprehension is predominately measured through multiple choice examinations. Yet, as we will discuss in this thesis, such exams are often criticized for their inaccuracies. With the advent of "big data" and the rise of ITS (Intelligent Tutoring Systems), increasing focus will be placed on finding dynamic, automated ways of measuring students' aptitude and progress. This work takes the first step towards automated learner classification based on the application of graphic organizers. We address the following specific problem experimentally: How effectively can we measure task comprehension via human translation of written text into a visual representation on a computer? Can an algorithm employ data from user interface (UI) interaction during the problem solving process, to classify the user's abilities? Specifically, from the data we show machine learning predictions of what a human expert would say about the: 1. integrity of the visual representation produced; 2. level of logical problem solving strategy the user applies to the exercise; 3. level of effort the user gives to the exercise. The core of the experiment is a software system that allows a human subject to read a preselected text and then "draw" a diagram by manipulating icons on a grid-canvas using standard transforms.

### Algorithms and Hypothesis Selection in Dynamic Homology Phylogenetic Analysis

Author:Andres VaronYear of Dissertation:2010Program:Computer ScienceAdvisor:Amotz Bar-NoyAbstract:Phylogeny and alignment estimation are two important, and closely related biological problems. In the typical alignment problem, insertions, deletions, and substitutions need to be inferred, to understand the evolutionary patterns of life. With the technological advances of the last 20 years, phylogenetic analyses will grow to include complete chromosomes and genomes. With these data sets, not only insertions, deletions, and substitutions, but also rearrangements such as duplications, translocations, transpositions, and inversions must be taken into consideration. In this study, phylogenetic analysis is explored at three different levels. At the first level, new heuristic algorithms for the joint estimation of phylogenies and alignments under the Maximum Parsimony optimality criterion are described. Our experimental study showed that the new algorithms perform dramatically better when compared to previous heuristics. These new algorithms will allow biologists to analyze larger data sets in shorter periods of time. At the second level, new and existing algorithms where implemented in the computer program POY version 4. POY has had a significant impact in the biology community, and is used by hundreds of biologists around the world. POY will serve as a platform for long term research both in algorithm engineering, and biology. At the third level, the problem of parameter and model selection in complete chromosome analyses is explored. We propose and describe the use of Kolmogorov Complexity (KC) as optimality criterion, as a unifying criterion for phylogenetic analysis. Our evaluation using simulations showed that KC correctly identifies phylogeny and model with high frequency. These results are evidence that KC is very well suited for the difficulties posed by the phylogenetic analysis of complete genomes.