Filter Dissertations and Theses By:
A SCALABLE AGENT-BASED SYSTEM FOR NETWORK FLOW RECONSTRUCTION WITH APPLICATIONS TO DETERMINING THE STRUCTURE AND DYNAMICS OF DISTRIBUTED DENIAL OF SERVICE ATTACKS
Year of Dissertation:
In this thesis we describe a novel agent-based architecture for flow reconstruction, and demonstrate how it can be applied to obtain a description of the structure and dynamics of distributed denial of service (DDoS) attacks. We show that the system can operate in a decentralized manner, effectively providing a description of the structure and dynamics of traffic flows even with very modest levels of agent deployment. By providing structural information, the system facilitates the execution of DDoS mitigation strategies close to the actual sources of attack traffic. Through simulations, we validate the efficacy with which the system is able to discover traffic source locations and the structure of traffic flows. Through packet-level simulations, we show favorable convergence properties for the system. We describe several schemes for selecting the precise links on which agents should be placed, and show that these placement schemes yield marked improvements in system performance and scalability. Finally, we introduce a prototype attacker localization scheme called SLANT, which combines information from a sequence of attacks on different victims, in order to further isolate traffic source locations. SLANT shows promise for using multiple attack data to determine the exact locations of the attackers, even at moderate agent deployment levels.
Simplifying Network Testing: Techniques and Approaches Towards Automating and Simplifying the Testing Process
Year of Dissertation:
The dramatic increase of companies and consumers that heavily depend on networks mandates the creation of reliable network devices. Such reliability can be achieved by testing both the conformance of individual protocols of an implementation to their corresponding specifications and the interaction between different protocols. With the increase of computer power and the advances in network testing research, one would expect that efficient approaches for testing network implementations would be available. However, such approaches are not available due to reasons like the complexity of network protocols, the need for different protocols to interoperate, the limited information on implementation because of proprietary codes, and the potentially unbounded size of the network to be tested. To address these issues, a novel technique is proposed that improves the quality of the test while reducing the time and effort network testing requires. The proposed approach achieves these goals, by automating the process of creating models to be used for validating an implementation. More precisely, it utilizes observations acquired by monitoring the behavior of the implementation for the automatic generation of models. In this way, generated models can accurately represent the actual implementation. Thus, testing is reduced to the problem of verifying that certain properties hold on the generated model. This work presents algorithms that efficiently create models from observations and shows their effectiveness through the presentation of three different examples. In addition, the difficulty of validating models using theorem provers is addressed. To address this issue, techniques available in the literature are utilized and approaches that assist testers with completing proofs are proposed. Results suggest that the complexity of making proofs using theorem proving can be reduced when models are members of the same class, i.e., their structure can be predicted. A final problem this work addresses is that of scale, i.e., the impracticality or even impossibility of testing every possible network configuration. To address this problem, the concept of self-similarity is introduced. A self-similar network has the property that can be sufficiently represented by a smaller network. Thus, proving the correctness of a smaller network is sufficient for proving the correctness of any self-similar network that can be represented by this smaller one.
PROTRU: Leveraging Provenance to Enhance Network Trust in a Wireless Sensor Network
Year of Dissertation:
Trust can be an important component of wireless sensor networks for believability of the produced data and historical value is a crucial asset in deciding trust of the data. A node's trust can change over time after its initial deployment due to various reasons such as energy loss, environmental conditions or exhausting sources. Provenance can play a significant role for supporting the calculation of information trust by recording the data flow and snapshots of the network. Furthermore provenance can be used for registering previous trust records and other information such as node type, data type, node location, average of the historical data. We will introduce a node-level trust-enhancing architecture for sensor networks using provenance. Our network will be cognitive in the sense that our system will react automatically upon detecting anomalies. Through simulations we will verify our model and will show that our approach can provide substantial enhancements in information trust as compared to the traditional network approaches.
3D Scene Reconstruction with Micro-Aerial Vehicles and Mobile Devices
Year of Dissertation:
Scene reconstruction is the process of building an accurate geometric model of one’s environment from sensor data. We explore the problem of real-time, large-scale 3D scene reconstruction in indoor environments using small laser range-finders and low-cost RGB-D (color plus depth) cameras. We focus on computationally-constrained platforms such as micro-aerial vehicles (MAVs) and mobile devices. These platforms present a set of fundamental challenges - estimating the state and trajectory of the device as it moves within its environment and utilizing lightweight, dynamic data structures to hold the representation of the reconstructed scene. The system needs to be computationally and memory-efficient, so that it can run in real time, onboard the platform. In this work, we present three scene reconstruction systems. The first system uses a laser range-finder and operates onboard a quadrotor MAV. We address the issues of autonomous control, state estimation, path-planning, and teleoperation. We propose the multi-volume occupancy grid (MVOG) - a novel data structure for building 3D maps from laser data, which provides a compact, probabilistic scene representation. The second system uses an RGB-D camera to recover the 6-DoF trajectory of the platform by aligning sparse features observed in the current RGB-D image against a model of previously seen features. We discuss our work on camera calibration and the depth measurement model. We apply the system onboard an MAV to produce occupancy-based 3D maps, which we utilize for path-planning. Finally, we present our contributions to a scene reconstruction system for mobile devices with built-in depth sensing and motion-tracking capabilities. We demonstrate reconstructing and rendering a global mesh on the fly, using only the mobile device’s CPU, in very large (300 square meter) scenes, at a resolutions of 2-3cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing.
Novel Approaches for the Performance Enhancement of Cognitive Radio Networks
Year of Dissertation:
This research is dedicated to the study of the challenges faced by Cognitive Radio (CR) networks, which include self-coexistence of the networks in the spectral environment, security and performance threats from malicious entities, and fairness in spectrum contention and utilization. We propose novel channel acquisition schemes that allow decentralized CR networks to have multiple channel access with minimal spectrum contentions. The multiple channel acquisition schemes facilitate fast spectrum access especially in cases where networks cannot communicate with each other. These schemes enable CR networks to self-organize and adapt to the dynamically changing spectral environment. We also present a self-coexistence mechanism that allows CR networks to coexist via the implementation of a risk-motivated channel selection based deference structure (DS). By forming DS coalitions, CR networks are able to have better access to preferred channels and can defer transmission to one another, thereby mitigating spectrum conflicts. CR networks are also known to be susceptible to Sybil threats from smart malicious radios with either monopolistic or disruptive intentions. We formulate novel threat and defense mechanisms to combat Sybil threats and minimize their impact on the performance of CR networks. A dynamic reputation system is proposed that considerably minimizes the effectiveness of intelligent Sybil attacks and improves the accuracy of spectrum-based decision-making processes. Finally, we present a distributed and cheat-proof spectrum contention protocol as an enhancement of the adaptive On-Demand Spectrum Contention (ODSC) protocol. The Modified On-Demand Spectrum Contention (MODSC) protocol enhances fairness and efficiency of spectrum access. We also show that there is substantial improvement in spectrum utilization with the incorporation of channel reuse into the MODSC protocol.
Multiscale Feature Extraction and Matching with Applications to 3D Face Recognition and 2D Shape Warping
Year of Dissertation:
Shape matching is defined as the process of computing a dissimilarity measure between shapes. Partial 3D shape matching refers to a more difficult subproblem that deals with measuring the dissimilarity between partial regions of 3D objects. Despite a great deal of attention drawn to 3D shape matching in the fields of computer vision and computer graphics, partial shape matching applied to objects of arbitrary scale remains a difficult problem. This work addresses the problem of partial 3D shape matching with no assumptions about the scale factors of the input objects. We introduce a multiscale feature extraction and matching technique that employs a new scale-space based representation for 3D surfaces. The representation is shown to be insensitive to noise, computationally efficient, and capable of automatic scale selection. Applications of the proposed representation are presented for automatic 3D surface registration, face detection, and face recognition. Test results involving two well-known 3D face datasets consisting of several thousand scanned human faces demonstrate that the performance of our recognition system is superior over competing methods. Estimating differential surface attributes, such as normals and curvatures, plays an important role in the performance of 3D matching systems. Noise in the data, however, poses the main challenge in estimating these attributes. Surface reconstruction methods, such as Moving Least Squares (MLS), help in minimizing the effects of noise. In this work, we also review the MLS approach for surface reconstruction, and show how the input noise affects the estimated differential attributes of the surface. We demonstrate how these results, together with statistical hypothesis testing, may be used to determine the smallest neighborhood size needed to estimate surface attributes. MLS reconstruction and the discrete Laplace-Beltrami operator are well-known geometric tools that have a wide range of applications. In addition to their prominent use in our 3D work, we describe a novel use of these tools in a 2D shape deformation system for retargeting garments among arbitrary poses.
Automatic Readability Assessment
Year of Dissertation:
We describe the development of an automatic tool to assess the readability of text documents. Our readability assessment tool predicts elementary school grade levels of texts with high accuracy. The tool is developed using supervised machine learning techniques on text corpora annotated with grade levels and other indicators of reading difficulty. Various independent variables or features are extracted from texts and used for automatic classification. We systematically explore different feature inventories and evaluate the grade-level prediction of the resulting classifiers. Our evaluation comprises well-known features at various linguistic levels from the existing literature, such as those based on language modeling, part-of-speech, syntactic parse trees, and shallow text properties, including classic readability formulas like the Flesch-Kincaid Grade Level formula. We focus in particular on discourse features, including three novel feature sets based on the density of entities, lexical chains, and coreferential inference, as well as features derived from entity grids. We evaluate and compare these different feature sets in terms of accuracy and mean squared error by cross-validation. Generalization to different corpora or domains is assessed in two ways. First, using two corpora of texts and their manually simplified versions, we evaluate how well our readability assessment tool can discriminate between original and simplified texts. Second, we measure the correlation between grade levels predicted by our tool, expert ratings of text difficulty, and estimated latent difficulty derived from experiments involving adult participants with mild intellectual disabilities. The applications of this work include selection of reading material tailored to varying proficiency levels, ranking of documents by reading difficulty, and automatic document summarization and text simplification.
SEARCHING FOR MOBILE DATA WITH A PRIORI STATISTICAL KNOWLEDGE OF THEIR WHEREABOUTS UNDER DELAY CONSTRAINTS
Year of Dissertation:
One or more tokens are hidden into several boxes, and then the boxes are locked. The probabilities of each token being found in each box are known. All the probabilities are independent. A searcher is looking for one, some, or all of the tokens by unlocking boxes in a predetermined number of rounds. In each round, any subset of the boxes can be unlocked and the searcher collects all tokens in them. Each box is associated with a positive unlocking cost. The goal is to minimize the expected cost of unlocking boxes until the desired tokens are found. The original motivation is to page mobile users in cellular network systems. Mobile users are tokens and cells are boxes. The probabilities of the users in cells can be extracted from historical data. The unlocking costs of boxes reflect the resources that are consumed to page a cell. The predetermined number of rounds ensures that the users will be found within a certain period of time (delay constraint). The goal is to minimize the resources that are consumed to find the users under the pre-determined delay constraint. In addition to the application of paging mobile users, this scheduling problem has broad utilization in finding information in sensor networks, searching for information in distributed data centers, medical decision making, etc. The special case in which a single token is sought and all the boxes have the same unlocking costs has been studied. Polynomial time optimal algorithms exist. Optimal search strategies can be found in a time which is quadratic with respect to the number of boxes and linear with respect to the number of rounds. We improve this time complexity to linear with respect to both the number of boxes and the number of rounds, and provide a hierarchy of algorithms that trades off optimality for complexity. In the general case of searching a single token while the boxes can have different unlocking costs, we prove it being strongly NP-hard, and provide various approximation algorithms. We also demonstrate a tradeoff between the time complexity and implementation complexity of our approximation algorithms. In the case in which we search multiple tokens and all boxes are of the same unlocking costs, we explore the conference call problem and the yellow page problem. In the former we want to find all tokens and in the later we want to find (any) one of the tokens. The conference call problem has been studied. It is NP-hard and approximation algorithms exist. We show a duality between both problems and provide efficient polynomial-time and exponential-time optimal algorithms for specific cases of the problems. We show a tradeoff between the time and space complexity of optimal algorithms. We implement all of our algorithms and some of the algorithms by other researchers. We conduct a comprehensive experimental study in the context of the paging mobile users application. The experimental study provides further insight of the behavior of algorithms and presents the performance of algorithms in real system.
Phylogenetic Trees and Their Analysis
Year of Dissertation:
Katherine St. John
Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant's Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with "perfect" data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data.
Discovering Regularity in Point Clouds of Urban Scenes
Year of Dissertation:
Despite the apparent chaos of the urban environment, cities are actually replete with regularity. From the grid of streets laid out over the earth, to the lattice of windows thrown up into the sky, periodic regularity abounds in the urban scene. Just as salient, though less uniform, are the self-similar branching patterns of trees and vegetation that line streets and fill parks. We propose novel methods for discovering these regularities in 3D range scans acquired by a time-of-flight laser sensor. The applications of this regularity information are broad, and we present two original algorithms. The first exploits the efficiency of the Fourier transform for the real-time detection of periodicity in building facades. Periodic regularity is discovered online by doing a plane sweep across the scene and analyzing the frequency space of each column in the sweep. The simplicity and online nature of this algorithm allow it to be embedded in scanner hardware, making periodicity detection a built-in feature of future 3D cameras. We demonstrate the usefulness of periodicity in view registration, compression, segmentation, and facade reconstruction. The second algorithm leverages the hierarchical decomposition and locality in space of the wavelet transform to find stochastic parameters for procedural models that succinctly describe vegetation. These procedural models facilitate the generation of virtual worlds for architecture, gaming, and augmented reality. The self-similarity of vegetation can be inferred using multi-resolution analysis to discover the underlying branching patterns. We present a unified framework of these tools, enabling the modeling, transmission, and compression of high-resolution, accurate, and immersive 3D images.