Filter Dissertations and Theses By:
Efficient Processing of Very Large XML Documents in Small Space
Year of Dissertation:
The focus of this research was to develop a highly efficient serialized data structure for use in Document Object Model (DOM) query systems of Extensible Markup Language (XML) documents, which are increasingly the dominant model for data representation across the World Web. The space efficiency of the serialized data structure developed is shown to improve query response time and eliminates the need to rebuild the DOM representation each time it is desired to query an existing XML document. Moreover, this serialized data structure enables DOM modeling of extremely large XML documents at a low and fixed memory cost. New algorithms and software tools for working with these data structures have been developed and this allows for compatibility with standard DOM modeling. The structures, tool and techniques presented can be readily adapted for use on the Semantic Web as it becomes a more prominent feature of the internet. The improved performance of the XML database storage and retrieval application developed in this thesis helps close the gap between relational database application performance and XML Document (NXD) database application performance. In addition, the novel DOM querying technique that we have developed in this thesis has potential application on hand held devices, autonomous robots, and other devices which have limited memory.
The Computational Complexity of Some Games and Puzzles With Theoretical Applications
Year of Dissertation:
The subject of this thesis is the algorithmic properties of one- and two-player games people enjoy playing, such as Sudoku or Chess. Questions asked about puzzles and games in this context are of the following type: can we design efficient computer programs that play optimally given any opponent (for a two-player game), or solve any instance of the puzzle in question? We examine four games and puzzles and show algorithmic as well as intractability results. First, we study the wolf-goat-cabbage puzzle, where a man wants to transport a wolf, a goat, and a cabbage across a river by using a boat that can carry only one item at a time, making sure that no incompatible items are left alone together. We study generalizations of this puzzle, showing a close connection with the Vertex Cover problem that implies NP-hardness as well as inapproximability results. Second, we study the SET game, a card game where the objective is to form sets of cards that match in a certain sense using cards from a special deck. We study single- and multi-round variations of this game and establish interesting con- nections with other classical computational problems, such as Perfect Multi- Dimensional Matching, Set Packing, Independent Edge Dominating Set, and Arc Kayles. We prove algorithmic and hardness results in the classical and the parameterized sense. Third, we study the UNO game, a game of colored numbered cards where players take turns discarding cards that match either in color or in number. We extend results by Demaine et. al. (2010 and 2014) that connected one- and two-player generaliza- tions of the game to Edge Hamiltonian Path and Generalized Geography, proving that a solitaire version parameterized by the number of colors is fixed param- eter tractable and that a k-player generalization for k greater or equal to 3 is PSPACE-hard. Finally, we study the Scrabble game, a word game where players are trying to form words in a crossword fashion by placing letter tiles on a grid board. We prove that a generalized version of Scrabble is PSPACE-hard, answering a question posed by Demaine and Hearn in 2008.
A Tiling Approach to Producing High Quality Solutions to Real World Timetabling Problems
Year of Dissertation:
The problem of constructing timetables in a fast and efficient manner is faced by a wide variety of scheduling practitioners. These timetables may specify the weekly calendar of university courses for a semester or contain the dates and times of games for a sports league. Current areas of research in timetabling have focused on problem instances that are small and often not representative of the real world challenges. Hence, the solution approaches proposed often do not solve the needs of the scheduling practitioner. This thesis describes a robust and efficient technique using a new approach, referred to as "tiling, "to provide high quality solutions for a variety of timetabling problems. The tiling method is an efficient and flexible method capable of managing the size and complexity of real world timetables. The tiles in this method represent conjectures about the relationship among the events to be scheduled within a timetabling problem. A constructive approach is used to create an initial solution in phase one of the tiling method. A second phase of local search is used to tune the solution. We demonstrate the flexibility of the approach through its application to large and small problems in various timetabling areas. By applying our method to problems in previous studies, we are able to demonstrate how our tiles are found within the best known solutions to date, and are comparable in quality with respect to the objective function. Yet the tiling approach only uses a small fraction of the computing resources used in the best approaches. Using our tiling approach, we directly address real world timetables to demonstrate its adaptability to real world scenarios. For example, the tiling approach has successfully been applied to sports leagues containing over five hundred teams for a youth sports league. In addition to the number of teams involved, this scheduling problem has other layers of complexity not previously addressed in other studies. Importantly, these leagues are more numerous than the professional sports, which have been the subject of most research to date. We introduce and define a new sports scheduling problem to encourage the further study of more common real world timetabling challenges.
2D DIctionary Matching in Small Space
Year of Dissertation:
The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. There are many scenarios in which storage capacity is limited or the data sets are exceedingly large. The added constraint of performing efficient dictionary matching using little or no extra space is a challenging and practical problem. This thesis focuses on the problem of performing dictionary matching on two-dimensional data in small space. We have developed the first efficient algorithms for succinct 2D dictionary matching in both static and dynamically changing data. Although time and space optimal dictionary matching algorithms for one-dimensional data have recently been developed, they have not yet been implemented. Since our two-dimensional algorithms employ one-dimensional dictionary matching, we created software to solve one-dimensional dictionary matching in small space. This is a first step towards developing software for succinct dictionary matching in two-dimensional data.
AUTOMATED AUCTION MECHANISM DESIGN WITH COMPETING MARKETPLACES
Year of Dissertation:
Resource allocation is a major issue in multiple areas of computer science. Auctions are commonly used in optimizing resource allocation in these areas, since well designed auctions achieve desirable economic outcomes including high allocative efficiency and fast response to supply and demand changes. This dissertation presents a grey-box approach to automated auction mechanism design using reinforcement learning and evolutionary computation methods. In contrast to the traditional approaches that try to design complete auction mechanisms manually, which is tedious and error-prone, the grey-box approach solves the problem through an automated search in a parameterized space of auction mechanisms. This space is defined by a novel, parameterized structure for auction mechanisms — a big white box — and a set of auction rules — each as a small black box — that can fit into the structure. The grey-box approach uses reinforcement learning to explore the composition of the structure, relates the performance of auction mechanisms to that of auction rules that form the mechanisms, and utilizes a Hall of Fame, a technique from evolutionary computation, to maintain viable auction mechanisms. The evaluation of auction mechanisms in the grey-box approach is conducted through a new strategic game, called CAT, which allows multiple marketplaces to run in parallel and compete to attract traders and make a profit. The CAT game helps to address the imbalance between prior work in this field that studied isolated auctions and the actual competitive situation that marketplaces face. Experiments were carried out to examine the effectiveness of the grey-box approach. A comparison against the genetic algorithm approach showed that the grey-box approach was able to produce mechanisms with significantly better overall performance. The best produced mechanisms from the grey-box experiments were able to outperform both the standard mechanisms which were used in evaluating sampled mechanisms during the grey-box search and carefully hand-coded mechanisms which won tournaments based on the CAT game. These best mechanisms also exhibited better performance than some existing mechanisms from the literature even when the evaluation did not take place in the context of CAT games.
Computer-Aided Reasoning about Knowledge and Justifications
Year of Dissertation:
The dissertation consists of two Chapters. In the first Chapter we compare two well-known type-based computer frameworks for computer aided logical reasoning and verification: MetaPRL and Coq. In particular, we implement in MetaPRL the Calculus of Inductive Constructions which is the theoretical base for Coq. This work has shown the common points of MetaPRL and Coq, and revealed their principal methodological differences. A possible application of this work is a possibility to perform re-validation in MetaPRL of the existing library of Coq proofs which could help to build more trust in the latter. Chapter 2 is the main contribution of the dissertation. It contains the description and testing results of an implementation of realization algorithm in epistemic modal logic that converts cut-free derivations in multi-agent epistemic modal logic into derivations in the corresponding Justification Logic where witnesses of knowledge (justification terms) are recovered for all instances of common knowledge. We also apply this algorithms to several well-known epistemic puzzles, such as Muddy Children, Wise Men, Wise Girls, etc.
Cancer Progression: Model, Therapy & Extraction
Year of Dissertation:
In this thesis we develop Cancer Hybrid Automata (CHAs), a modeling framework based on hybrid automata, to model the progression of cancers through discrete phenotypes. Both transition timing between states as well as the effect of drugs and clinical tests are parameters in the framework, thus allowing for the formalization of temporal statements about the progression as well as timed therapies. Using a game theoretical formulation of the problem we show how existing controller synthesis algorithms can be generalized to CHA models, so that (timed) therapies can be automatically generated. In the second part of this thesis we connect this formal framework to cancer patient data, focusing on copy number variation (CNV) data. The underlying process generating CNV segments is generally assumed to be memory-less, giving rise to an exponential distribution of segment lengths. We provide evidence from TCGA data suggesting that this generative model is too simplistic, and that segment lengths follow a power-law distribution instead. We show how an existing statistical method for detecting genetic regions of relevance for cancer can be improved through more accurate (power-law) null models. Finally, we develop an algorithm to extract CHA-like progression models from cross-sectional patient data. Based on a performance comparison on synthetic data, we show that our algorithm, which is based on a notion of probabilistic causality, outperforms an existing extraction method.
COMPONENT TREES FOR THE EXPLORATION OF MACROMOLECULAR STRUCTURES IN BIOLOGY
Year of Dissertation:
Understanding the three-dimensional structure of a macromolecular complex is essential for understanding its function. A component tree is a topological and geometric image descriptor that captures information regarding the structure of an image based on the connected components determined by different grayness thresholds. This dissertation presents a novel interactive framework for visual exploration of component trees of the density maps of macromolecular complexes, with the purpose of improved understanding of their structure. The interactive exploration of component trees together with a robust simplification methodology provide new insights in the study of macromolecular structures. An underlying mathematical theory is introduced and then is applied to studying digital pictures that represent objects at different resolutions. Illustrations of how component trees, and their simplifications, can help in the exploration of macromolecular structures include (i) identifying differences between two very similar viruses, (ii) showing how differences between the component trees reflect the fact that structures of mutant virus particles have varying sets of constituent proteins, (ii) utilizing component trees for density map segmentation in order to identify substructures within a macromolecular complex, (iv) showing how an appropriate component tree simplification may reveal the secondary structure in a protein, and (v) providing a potential strategy for docking a high-resolution representation of a substructure into a low-resolution representation of whole structure.
Service Oriented Wireless Sensor Network
Year of Dissertation:
Quality of service (QoS) applications and network lifetime management are key challenges in Wireless Sensor Networks (WSNs). Service-level guarantees that are achievable in Service Oriented Architecture (SOA) can ensure reliability, availability and overall usability in any application. How to create the necessary structure to implement SOA in a WSN to achieve the required levels of quality of service is still a research topic. The SOA paradigm provides the methodology to improve these qualities in WSNs. We present a strategy that uses sensors in a cluster and dynamic service selection to achieve higher levels in quality of service for WSNs, while preserving network lifetime. The realization of these strategies will be achieved through a SOA middleware layer in the WSN. The concept of SOA dynamic service selection is transformed so that it can be applied to WSNs. The transformation requires that we equate a service in SOA to functions that are performed in a WSN. In our architecture, a service or function will be achieved through a group of sensors called a sensor cluster. Quality metrics will be captured at the sensor cluster level each time it is executed within the WSN. We present sensor cluster selection algorithms that will determine the most effective and efficient sensor clusters to participate in a composite function plan. The sensor cluster selection algorithms guarantee a minimum level of quality by selecting services that meet a client's quality constraints while taking into account the network routing requirements to achieve the operation in the shortest amount of time utilizing the least amount of energy. Additionally, we present methods for implementing service selection using the multiple choice, multiple dimension knapsack (MMKP) algorithm within a composite function that has complex workflow. We make use of the WSN's MMKP selection algorithm using dynamic programming as a proof of concept to demonstrate that there is a significant improvement in the energy utilization, network lifetime and overall quality.
SOME ENHANCEMENT METHODS FOR BACKTRACKING-SEARCH IN SOLVING MULTIPLE PERMUTATION PROBLEMS
Year of Dissertation:
In this dissertation, we present some enhancement methods for backtracking-search in solving multiple permutation problems. Some well-known NP-complete multiple permutation problems are Quasigroup Completion Problem and Sudoku. Multiple permutation problems have been getting a lot of attention in the literature in the recent years due to having a highly structured nature and being a challenging combinatorial search problem. Furthermore, it has been shown that many real-world problems in scheduling and experimental design take the form of multiple permutation problems. Therefore, it has been suggested that they can be used as a benchmark problem to test various enhancement methods for solving constraint satisfaction problems. Then it is hoped that the insight gained from studying them can be applied to other hard structured as well as unstructured problems. Our supplementary and novel enhancement methods for backtracking-search in solving these multiple permutation problems can be summarized as follows: We came up with a novel way to encode multiple permutation problems and then we designed and developed an arc-consistency algorithm that is tailored towards this modeling. We implemented five versions of this arc-consistency algorithm where the last version eliminates almost all of the possible propagation redundancy. Then we introduced the novel notion of interlinking dynamic variable ordering with dynamic value ordering, where the dynamic value ordering is also used as a second tie-breaker for the dynamic variable ordering. We also proposed the concept of integrating dynamic variable ordering and dynamic value ordering into an arc-consistency algorithm by using greedy counting assertions. We developed the concept of enforcing local-consistency between variables from different redundant models of the problem. Finally, we introduced an embarrassingly parallel task distribution process at the beginning of the search. We theoretically proved that the limited form of the Hall's theorem is enforced by our modeling of the multiple permutation problems. We showed with our empirical results that the ``fail-first" principle is confirmed in terms of minimizing the total number of explored nodes, but is refuted in terms of minimizing the depth of the search tree when finding a single solution, which correlates with previously published results. We further showed that the performance (total number instances solved at the phase transition point within a given time limit) of a given search heuristic is closely related to the underlying pruning algorithm that is being employed to maintain some level of local-consistency during backtracking-search. We also extended the previously established hypothesis, which stated that the second peak of hardness for NP-complete problems is algorithm dependent, to second peak of hardness for NP-complete problems is also search-heuristic dependent. Then we showed with our empirical results that several of our enhancement methods on backtracking-search perform better than the constraint solvers MAC-LAH and Minion as well as the SAT solvers Satz and MiniSat for previously tested instances of multiple permutation problems on these solvers.