Filter Dissertations and Theses By:
A SCALABLE AGENT-BASED SYSTEM FOR NETWORK FLOW RECONSTRUCTION WITH APPLICATIONS TO DETERMINING THE STRUCTURE AND DYNAMICS OF DISTRIBUTED DENIAL OF SERVICE ATTACKS
Year of Dissertation:
In this thesis we describe a novel agent-based architecture for flow reconstruction, and demonstrate how it can be applied to obtain a description of the structure and dynamics of distributed denial of service (DDoS) attacks. We show that the system can operate in a decentralized manner, effectively providing a description of the structure and dynamics of traffic flows even with very modest levels of agent deployment. By providing structural information, the system facilitates the execution of DDoS mitigation strategies close to the actual sources of attack traffic. Through simulations, we validate the efficacy with which the system is able to discover traffic source locations and the structure of traffic flows. Through packet-level simulations, we show favorable convergence properties for the system. We describe several schemes for selecting the precise links on which agents should be placed, and show that these placement schemes yield marked improvements in system performance and scalability. Finally, we introduce a prototype attacker localization scheme called SLANT, which combines information from a sequence of attacks on different victims, in order to further isolate traffic source locations. SLANT shows promise for using multiple attack data to determine the exact locations of the attackers, even at moderate agent deployment levels.
Simplifying Network Testing: Techniques and Approaches Towards Automating and Simplifying the Testing Process
Year of Dissertation:
The dramatic increase of companies and consumers that heavily depend on networks mandates the creation of reliable network devices. Such reliability can be achieved by testing both the conformance of individual protocols of an implementation to their corresponding specifications and the interaction between different protocols. With the increase of computer power and the advances in network testing research, one would expect that efficient approaches for testing network implementations would be available. However, such approaches are not available due to reasons like the complexity of network protocols, the need for different protocols to interoperate, the limited information on implementation because of proprietary codes, and the potentially unbounded size of the network to be tested. To address these issues, a novel technique is proposed that improves the quality of the test while reducing the time and effort network testing requires. The proposed approach achieves these goals, by automating the process of creating models to be used for validating an implementation. More precisely, it utilizes observations acquired by monitoring the behavior of the implementation for the automatic generation of models. In this way, generated models can accurately represent the actual implementation. Thus, testing is reduced to the problem of verifying that certain properties hold on the generated model. This work presents algorithms that efficiently create models from observations and shows their effectiveness through the presentation of three different examples. In addition, the difficulty of validating models using theorem provers is addressed. To address this issue, techniques available in the literature are utilized and approaches that assist testers with completing proofs are proposed. Results suggest that the complexity of making proofs using theorem proving can be reduced when models are members of the same class, i.e., their structure can be predicted. A final problem this work addresses is that of scale, i.e., the impracticality or even impossibility of testing every possible network configuration. To address this problem, the concept of self-similarity is introduced. A self-similar network has the property that can be sufficiently represented by a smaller network. Thus, proving the correctness of a smaller network is sufficient for proving the correctness of any self-similar network that can be represented by this smaller one.
PROTRU: Leveraging Provenance to Enhance Network Trust in a Wireless Sensor Network
Year of Dissertation:
Trust can be an important component of wireless sensor networks for believability of the produced data and historical value is a crucial asset in deciding trust of the data. A node's trust can change over time after its initial deployment due to various reasons such as energy loss, environmental conditions or exhausting sources. Provenance can play a significant role for supporting the calculation of information trust by recording the data flow and snapshots of the network. Furthermore provenance can be used for registering previous trust records and other information such as node type, data type, node location, average of the historical data. We will introduce a node-level trust-enhancing architecture for sensor networks using provenance. Our network will be cognitive in the sense that our system will react automatically upon detecting anomalies. Through simulations we will verify our model and will show that our approach can provide substantial enhancements in information trust as compared to the traditional network approaches.
Multiscale Feature Extraction and Matching with Applications to 3D Face Recognition and 2D Shape Warping
Year of Dissertation:
Shape matching is defined as the process of computing a dissimilarity measure between shapes. Partial 3D shape matching refers to a more difficult subproblem that deals with measuring the dissimilarity between partial regions of 3D objects. Despite a great deal of attention drawn to 3D shape matching in the fields of computer vision and computer graphics, partial shape matching applied to objects of arbitrary scale remains a difficult problem. This work addresses the problem of partial 3D shape matching with no assumptions about the scale factors of the input objects. We introduce a multiscale feature extraction and matching technique that employs a new scale-space based representation for 3D surfaces. The representation is shown to be insensitive to noise, computationally efficient, and capable of automatic scale selection. Applications of the proposed representation are presented for automatic 3D surface registration, face detection, and face recognition. Test results involving two well-known 3D face datasets consisting of several thousand scanned human faces demonstrate that the performance of our recognition system is superior over competing methods. Estimating differential surface attributes, such as normals and curvatures, plays an important role in the performance of 3D matching systems. Noise in the data, however, poses the main challenge in estimating these attributes. Surface reconstruction methods, such as Moving Least Squares (MLS), help in minimizing the effects of noise. In this work, we also review the MLS approach for surface reconstruction, and show how the input noise affects the estimated differential attributes of the surface. We demonstrate how these results, together with statistical hypothesis testing, may be used to determine the smallest neighborhood size needed to estimate surface attributes. MLS reconstruction and the discrete Laplace-Beltrami operator are well-known geometric tools that have a wide range of applications. In addition to their prominent use in our 3D work, we describe a novel use of these tools in a 2D shape deformation system for retargeting garments among arbitrary poses.
Automatic Readability Assessment
Year of Dissertation:
We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.
SEARCHING FOR MOBILE DATA WITH A PRIORI STATISTICAL KNOWLEDGE OF THEIR WHEREABOUTS UNDER DELAY CONSTRAINTS
Year of Dissertation:
One or more tokens are hidden into several boxes, and then the boxes are locked. The probabilities of each token being found in each box are known. All the probabilities are independent. A searcher is looking for one, some, or all of the tokens by unlocking boxes in a predetermined number of rounds. In each round, any subset of the boxes can be unlocked and the searcher collects all tokens in them. Each box is associated with a positive unlocking cost. The goal is to minimize the expected cost of unlocking boxes until the desired tokens are found. The original motivation is to page mobile users in cellular network systems. Mobile users are tokens and cells are boxes. The probabilities of the users in cells can be extracted from historical data. The unlocking costs of boxes reflect the resources that are consumed to page a cell. The predetermined number of rounds ensures that the users will be found within a certain period of time (delay constraint). The goal is to minimize the resources that are consumed to find the users under the pre-determined delay constraint. In addition to the application of paging mobile users, this scheduling problem has broad utilization in finding information in sensor networks, searching for information in distributed data centers, medical decision making, etc. The special case in which a single token is sought and all the boxes have the same unlocking costs has been studied. Polynomial time optimal algorithms exist. Optimal search strategies can be found in a time which is quadratic with respect to the number of boxes and linear with respect to the number of rounds. We improve this time complexity to linear with respect to both the number of boxes and the number of rounds, and provide a hierarchy of algorithms that trades off optimality for complexity. In the general case of searching a single token while the boxes can have different unlocking costs, we prove it being strongly NP-hard, and provide various approximation algorithms. We also demonstrate a tradeoff between the time complexity and implementation complexity of our approximation algorithms. In the case in which we search multiple tokens and all boxes are of the same unlocking costs, we explore the conference call problem and the yellow page problem. In the former we want to find all tokens and in the later we want to find (any) one of the tokens. The conference call problem has been studied. It is NP-hard and approximation algorithms exist. We show a duality between both problems and provide efficient polynomial-time and exponential-time optimal algorithms for specific cases of the problems. We show a tradeoff between the time and space complexity of optimal algorithms. We implement all of our algorithms and some of the algorithms by other researchers. We conduct a comprehensive experimental study in the context of the paging mobile users application. The experimental study provides further insight of the behavior of algorithms and presents the performance of algorithms in real system.
Phylogenetic Trees and Their Analysis
Year of Dissertation:
Katherine St. John
Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant's Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with "perfect" data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data.
Discovering Regularity in Point Clouds of Urban Scenes
Year of Dissertation:
Despite the apparent chaos of the urban environment, cities are actually replete with regularity. From the grid of streets laid out over the earth, to the lattice of windows thrown up into the sky, periodic regularity abounds in the urban scene. Just as salient, though less uniform, are the self-similar branching patterns of trees and vegetation that line streets and fill parks. We propose novel methods for discovering these regularities in 3D range scans acquired by a time-of-flight laser sensor. The applications of this regularity information are broad, and we present two original algorithms. The first exploits the efficiency of the Fourier transform for the real-time detection of periodicity in building facades. Periodic regularity is discovered online by doing a plane sweep across the scene and analyzing the frequency space of each column in the sweep. The simplicity and online nature of this algorithm allow it to be embedded in scanner hardware, making periodicity detection a built-in feature of future 3D cameras. We demonstrate the usefulness of periodicity in view registration, compression, segmentation, and facade reconstruction. The second algorithm leverages the hierarchical decomposition and locality in space of the wavelet transform to find stochastic parameters for procedural models that succinctly describe vegetation. These procedural models facilitate the generation of virtual worlds for architecture, gaming, and augmented reality. The self-similarity of vegetation can be inferred using multi-resolution analysis to discover the underlying branching patterns. We present a unified framework of these tools, enabling the modeling, transmission, and compression of high-resolution, accurate, and immersive 3D images.
Finsler Optimal Control Theory
Year of Dissertation:
This thesis is a contribution to solving problems of extracting optimal controls for complex systems. Its novelty consists of a detailed examination of Finsler geometry based control by connections as proposed by Kohn and Nerode and its relation to Pontryagin's maximum principle. The long term hope is that these methods will form the underpinning of the applications of control of hybrid systems. Any advances in mathematical and algorithmic techniques for solving such problems would have wide application in business, industry, and science. The widespread use of Bellman's dynamic programming, Dantzig's linear programming, Kalman's optimization with linear quadratic cost functions, demonstrate this. But symbolic and numerical techniques historically have fallen well short of yielding efficient computation procedures to obtain near optimal controls for complex systems. Most investigations have been based on Pontryagin's geometric representation of optimal control and his maximum principle. Kohn and Nerode have proposed a different problem formulation aimed at extracting robust controls as a function of state (synthesis problem with a robustness requirement) in the context of a Finsler geometry corresponding to the optimal control problem. This leads to the study of geometric, symbolic, and numerical methods for solving geodesic equations for connections in Finsler Geometry. A principal result of this thesis is the determination of the relations between the Finsler and Pontryagin formulations of optimal control, and the transformation from one to the other. A second principal result is establishing the relations between robustness and curvature. Curvature is used to quantify the spread of geodesics due to disturbances. Finally, the thesis concludes with numerical integration schemes for computing controls and local connections.
Eye Tracking and Performance Evaluation: Automatic Detection of User Outcomes
Year of Dissertation:
While evaluating user task performance with eye tracking has been examined within the field of Human-Computer Interaction (HCI), previous research has generally utilized eye tracking to understand how users perform a task. This dissertation defines a new paradigm by which eye-tracking research can be used in HCI: to predict whether users will be successful at a task, using a pattern classification model trained on their eye-tracking data. This dissertation describes the experimental framework and it demonstrates (through laboratory experiments and machine-learning modeling) the feasibility and limits of predicting user outcomes based solely on eye movement patterns. Utilizing an online learning scenario as a proof-of-concept application of this technique, several rounds of eye-tracking data collection studies were conducted in which participants viewed multiple windows of simultaneous information content: a video of an instructor, lecture slides, a transcript of the speech, etc. After viewing the lesson while being recorded with an eye tracker, students responded to a test in which their understanding of the information content was measured. A variety of machine-learning approaches and feature selection techniques were used to explore the relationship between the eye movements of students and their success on the final test, and classification models that outperformed a majority-label baseline were successfully trained. To test the limits of this approach, additional studies were conducted with modified user-interfaces that relaxed some of the homogeneity of information content and visual presentation that were rigorously maintained in the initial experiment. In addition, a follow-up study was conducted in which the strict temporal segmentation of the experimental session was randomized, to measure the robustness of the modeling approach to such perturbations of the data-collection and analysis methodology. The results of this study can assist HCI eye-tracking researchers in developing new techniques for evaluating systems, e.g., by predicting users’ task performance based on eye-tracking data. Further, this dissertation lays the conceptual and methodological groundwork for the design of intelligent systems for predicting which users may be struggling with a task, based upon an automatic classification of users into groups of high- or low-performers based upon an examination of their eye movements alone.