Filter Dissertations and Theses By:
AUTOMATED LEARNER CLASSIFICATION THROUGH INTERFACE EVENT STREAM AND SUMMARY STATISTICS ANALYSIS
Year of Dissertation:
Reading comprehension is predominately measured through multiple choice examinations. Yet, as we will discuss in this thesis, such exams are often criticized for their inaccuracies. With the advent of "big data" and the rise of ITS (Intelligent Tutoring Systems), increasing focus will be placed on finding dynamic, automated ways of measuring students' aptitude and progress. This work takes the first step towards automated learner classification based on the application of graphic organizers. We address the following specific problem experimentally: How effectively can we measure task comprehension via human translation of written text into a visual representation on a computer? Can an algorithm employ data from user interface (UI) interaction during the problem solving process, to classify the user's abilities? Specifically, from the data we show machine learning predictions of what a human expert would say about the: 1. integrity of the visual representation produced; 2. level of logical problem solving strategy the user applies to the exercise; 3. level of effort the user gives to the exercise. The core of the experiment is a software system that allows a human subject to read a preselected text and then "draw" a diagram by manipulating icons on a grid-canvas using standard transforms.
Algorithms and Hypothesis Selection in Dynamic Homology Phylogenetic Analysis
Year of Dissertation:
Phylogeny and alignment estimation are two important, and closely related biological problems. In the typical alignment problem, insertions, deletions, and substitutions need to be inferred, to understand the evolutionary patterns of life. With the technological advances of the last 20 years, phylogenetic analyses will grow to include complete chromosomes and genomes. With these data sets, not only insertions, deletions, and substitutions, but also rearrangements such as duplications, translocations, transpositions, and inversions must be taken into consideration. In this study, phylogenetic analysis is explored at three different levels. At the first level, new heuristic algorithms for the joint estimation of phylogenies and alignments under the Maximum Parsimony optimality criterion are described. Our experimental study showed that the new algorithms perform dramatically better when compared to previous heuristics. These new algorithms will allow biologists to analyze larger data sets in shorter periods of time. At the second level, new and existing algorithms where implemented in the computer program POY version 4. POY has had a significant impact in the biology community, and is used by hundreds of biologists around the world. POY will serve as a platform for long term research both in algorithm engineering, and biology. At the third level, the problem of parameter and model selection in complete chromosome analyses is explored. We propose and describe the use of Kolmogorov Complexity (KC) as optimality criterion, as a unifying criterion for phylogenetic analysis. Our evaluation using simulations showed that KC correctly identifies phylogeny and model with high frequency. These results are evidence that KC is very well suited for the difficulties posed by the phylogenetic analysis of complete genomes.
GEOMETRIC SEPARATION AND PACKING PROBLEMS
Year of Dissertation:
The first part of this thesis investigates combinatorial and algorithmic aspects of geometric separation problems in the plane. In such a setting one is given a set of points and a set of separators such as lines, line segments or disks. The goal is to select a small subset of those separators such that every path between any two points is intersected by at least one separator. We first look at several problems which arise when one is given a set of points and a set of unit disks embedded in the plane and the goal is to separate the points using a small subset of the given disks. Next, we focus on a separation problem involving only one region: Given a region in the plane, bounded by a piecewise linear closed curve, such as a fence, place few guards inside the fenced region such that wherever an intruder cuts through the fence, the closest guard is at most a distance one away. Restricting the separating objects to be lines, we investigate combinatorial aspects which arise when we use them to pairwise separate a set of points in the plane; hereafter we generalize the notion of separability to arbitrary sets and present several enumeration results. Lastly, we investigate a packing problem with a non-convex shape in R^3. We show that R^3 can be packed at a density of 0.222 with tori of major radius one and minor radius going to zero. Furthermore, we show that the same torus arrangement yields the asymptotically optimal number of pairwise linked tori.
Timed Modal Epistemic Logic
Year of Dissertation:
There will be three parts in this thesis. The first part is a survey of epistemic logic. Epistemic logic was first introduced by philosophers, and later found its applications in fields such as Computer Science and Economics. The survey will cover both the philosophical debates and applications of epistemic logic, and then discussions of the logical omniscience problem will follow. The second part is the introduction of a new logical framework timed Modal Epistemic Logic. tMEL. tMEL is extended from ordinary modal epistemic logic, MEL, by adding numerical labels to knowledge statement to indicate when the statement is known. We will argue how a logical framework reasoning about both knowledge and the time of reasoning can help to resolve the problem of logical omniscience, and tMEL serves well as a logically non-omniscient epistemic system. Finally, we will discuss the syntactical relations between MEL, tMEL, and Justification Logic, from which the study of MEL is originated. Our focus will be on the relations between axiomatic proofs of these logical frameworks. We will first determine a proper subclass of modal logical proofs called non-circular, and prove that this class of proofs is complete. And then we will show that every non-circular MEL proof can be turned into a tMEL proof by finding suitable number labels, and prove that there is a two-way translation between proofs in tMEL and Justification Logic. Combining these results, a formal connection between non-circular proofs and proofs in Justification Logic is established, and the whole procedure gives us an alternative algorithm for the realization between theorems in modal logic and Justification Logic. This is the end of the abstract.
AN ADAPTIVE AND INTEGRATED MULTIMODAL SENSING AND PROCESSING FRAMEWORK FOR LONG RANGE MOVING OBJECT DETECTION AND CLASSIFICATION
Year of Dissertation:
In applications such as surveillance, inspection and traffic monitoring, long-range detection and classification of targets (vehicles, humans, etc) is a highly desired feature for a sensing system. A single modality will no longer provide the required performance due to the challenges in detection and classification with low resolutions, noisy sensor signals, and various environmental factors due to large sensing distances. Multimodal sensing and processing, on the other hand, can provide complementary information from heterogeneous sensor modalities, such as audio, visual and range sensors. However, there is a lack of effective sensing mechanisms and systematic approaches for sensing and processing using multimodalities. In this thesis, we described a systematical framework for Adaptive and Integrated Multimodal Sensing and Processing (thereafter, the AIM-SP framework) that integrates novel multimodal long-range sensors, adaptive feature selection and learning-based object detection and classification for achieving the goal of adaptive and integrated multimodal sensing and processing. Based on the AIM-SP framework, we have made three unique contributions. First, we have designed a novel multimodal sensor system called Vision-Aided Automated Vibrometry (VAAV), which is capable of automatically obtaining visual, range and acoustic signatures for moving object detection at a large distance. Second, multimodal data, acquired from multiple sensing sources, are integrated and represented in a Multimodal Temporal Panorama (MTP) for easy alignment and fast labeling. Accuracy of target detection can be improved using multimodalities. Further, a visual reconstruction method is developed to remove occlusions, motion blurs and perspective distortions of moving vehicles. With various types of features extracted on aligned multimodal samples, we made our third contribution on feature modality selection using two approaches. The first approach uses multi-branch sequential-based feature searching (MBSF) and the second one uses boosting-based feature learning (BBFL).
SYSTEMATIC COMPARISON OF CROSS-LINGUAL PROJECTION TECHNIQUES FOR LOW-DENSITY NLP UNDER STRICT RESOURCE CONSTRAINTS
Year of Dissertation:
The field of low-density NLP is often approached from an engineering perspective, and evaluations are typically haphazard - considering different architectures, given different languages, and different available resources - without a systematic comparison. The resulting architectures are then tested on the unique corpus and language for which this approach has been designed. This makes it difficult to truly evaluate which approach is truly the "best," or which approaches are best for a given language. In this dissertation, several state-of-the-art architectures and approaches to low-density language Part-Of-Speech Tagging are reimplemented; all of these techniques exploit a relationship between a high-density (HD) language and a low-density (LD) language. As a novel contribution, a testbed is created using a representative sample of seven (HD - LD) language pairs, all drawn from the same massively parallel corpus, Europarl, and selected for their particular linguistic features. With this testbed in place, never-before-possible comparisons are conducted, to evaluate which broad approach performs the best for particular language pairs, and investigate whether particular language features should suggest a particular NLP approach. A survey of the field suggested some unexplored approaches with potential to yield better performance, be quicker to implement, and require less intensive linguistic resources. Under strict resource limitations, which are typical for low-density NLP environments, these characteristics are important. The approaches investigated in this dissertation are each a form of language ifier, which modifies an LD-corpus to be more like an HD-corpus, or alternatively, modifies an HD-corpus to be more like an LD-corpus, prior to supervised training. Each relying on relatively few linguistic resources, four variations of language ifier designs have been implemented and evaluated in this dissertation: lexical replacement, affix replacement, cognate replacement, and exemplar replacement. Based on linguistic properties of the languages drawn from the Europarl corpus, various predictions were made of which prior and novel approaches would be most effective for languages with specific linguistic properties, and these predictions were evaluated through systematic evaluations with the testbed of languages. The results of this dissertation serve as guidance for future researchers who must select an appropriate cross-lingual projection approach (and a high-density language from which to project) for a given low-density language. Finally, all the languages drawn from the Europarl corpus are actually HD, but for the sake of the evaluation testbed in this dissertation, certain languages are treated as if they were LD (ignoring any available HD resources). In order to evaluate how various approaches perform on an actual LD language, a case study was conducted in which part-of-speech taggers were implemented for Tajiki, harnessing linguistic resources from a related HD-language, Farsi, using all of the prior and novel approaches investigated in this dissertation. Insights from this case study were documented so that future researchers can gain insight into what their experience might be in implementing NLP tools for an LD language given the strict resource limitations considered in this dissertation.
A Parametrization-Based Surface Reconstruction System for Triangular Mesh Simplification with Application to Large Scale Scenes
Year of Dissertation:
The laser scanner is nowadays widely used to capture the geometry of art, animation maquettes, or large architectural, industrial, and land form models. It thus poses specific problems depending on the model scale. This thesis provides a solution for simplification of triangulated data and for surface reconstruction of large data sets, where feature edges provide an obvious segmentation structure. It also explores a new method for model segmentation, with the goal of applying multiresolution techniques to data sets characterized by curvy areas and the lack of clear demarcation features. The preliminary stage of surface segmentation, which takes as input single or multiple scan data files, generates surface patches which are processed independently. The surface components are mapped onto a two-dimensional domain with boundary constraints, using a novel parametrization weight coefficient. This stage generates valid parameter domain points, which can be fed as arguments to parametric modeling functions or surface approximation schemes. On this domain, our approach explores two types of remeshing. First, we generate points in a regular grid pattern, achieving multiresolution through a flexible grid step, which nevertheless is designed to produce a globally uniform resampling aspect. In this case, for reconstruction, we attempt to solve the open problem of border reconciliation across adjacent domains by retriangulating the border gap between the grid and the fixed irregular border. Alternatively, we straighten the domain borders in the parameter domain and coarsely triangulate the resulting simplified polygons, resampling the base domain triangles in a 1-4 subdivision pattern, achieving multiresolution from the number of subdivision steps. For mesh reconstruction, we use a linear interpolation method based on the original mesh triangles as control points on local planes, using a saved triangle correspondence between the original mesh and the parametric domain. We also use a region-wide approximation method, applied to the parameter grid points, which first generates data-trained control points, and then uses them to obtain the reconstruction values at the resamples. In the grid resampling scheme, due to the border constraints, the reassembly of the segmented, sequentially processed data sets is seamless. In the subdivision scheme, we align adjacent border fragments in the parameter space, and use a region-to-fragment map to achieve the same border reconstruction across two neighboring components. We successfully process data sets up to 1,000,000 points in one pass of our program, and are capable of assembling larger scenes from sequential runs. Our program consists of a single run, without intermediate storage. Where we process large input data files, we fragment the input using a nested application of our segmentation algorithm to reduce the size of the input scenes, and our pipeline reassembles the reconstruction output from multiple data files into a unique view.
A SIMULATION STUDY ON USING THE VIRTUAL NODE LAYER TO IMPLEMENT EFFICIENT AND RELIABLE MANET PROTOCOLS
Year of Dissertation:
The Virtual Node Layer (VNLayer) is a cluster based programming abstraction for a Mobile Ad-Hoc Network. VNLayer defines fixed or predictably mobile geographical regions. In each region, a number of mobile nodes collectively emulate a virtual node, which provides services and relays packets for client processes. As a clustering scheme with state replication, the VNLayer approach can theoretically improve the efficiency and reliability of MANET protocols. As a general programming abstraction, the VNLayer hides underlying complexities from protocol developers and can be shared by multiple applications. However, the VNLayer also introduces extra control overhead and prolongs data forwarding delay, which could be prohibitively expensive in terms of performance. Based on an existing VNLayer implementation , we developed an ns-2 based software package, VNSim. VNSim can be used to simulate VNLayer based applications in a MANET of up to a few hundred mobile nodes, in order to better understand the impact of the VNLayer approach on efficiency and reliability. With VNSim, we did our first case study on a VNLayer based MANET address allocation protocol, VNDHCP. Simulation results proved that the VNLayer approach can be used to adapt a wireline protocol to MANET. We also did an extensive simulation study on VNLayer based MANET routing. A wireline routing protocol, RIP, was adapted to run over the VNLayer. With the support provided by the VNLayer, the adapted protocol, VNRIP, was implemented very quickly and can provide reasonable performance. During the study of VNLayer based MANET routing, we identified a number of major performance limitations in the existing VNLayer implementation and the models it is based on. To tackle the limitations, we created a more realistic link layer model, extended the VNLayer model and optimized our VNLayer implementation. With the optimized VNLayer implementation, we implemented VNAODV, an adapted version of AODV, over the new link and VNLayer models. Simulation results indicate that VNAODV delivers more packets and creates more stable routes than standard AODV in a dense MANET with high node motion rate and moderate data traffic. This research validated the intuition that the VNLayer approach can be used to adapt wireline protocols quickly to MANET and to improve the performance of MANET protocols. This research also provided us some insights on how to implement and optimize cluster based MANET protocols.
OPTIMIZATION ALGORITHMS FOR PROXY PLACEMENT IN CONTENT DISTRIBUTION NETWORKS
Year of Dissertation:
Popular web sites receive an enormous share of Internet traffic. These sites have a competitive motivation to offer better service to their clients at lower cost. One of the solutions is to using content distribution network (CDN). When we optimize proxy placement in content distribution network we want to maximize the user experience in the mean time to minimize the resource consumption. We are dealing with a multi-objective optimization problem for CDNs in the thesis: the content latency experienced by users and the system resources expended to deliver content. After the topology model for content distribution network, we listed metrics to evaluate content distribution network. Then we defined the objective function for optimizing proxy placement. After reviewing the existing proxy placement algorithms we used genetic algorithm to solve the proxy placement problem. We first apply the genetic algorithms to a simple network, and find that the results are better than greedy algorithm and identical to the global optimal. Then we apply the genetic algorithm to a realistically larger network topology. The genetic algorithm can find good solutions to the proxy placement problem. But compared to greedy algorithm, it is less efficient. So when response time is important, genetic algorithm is not the best choice. To optimize the genetic algorithm we found if we choose higher mutation rate and step size at the beginning of the genetic operation and decrease the mutation rate and step size as we go through the genetic operation, the results are much better. For the initial populations, if we want quick response to the sudden change of client access we can choose smaller initial population, if we have sufficient response time especially in initial setting up the network we should choose larger initial populations. Under optimized genetic algorithm, we experiment three different network examples with their sizes between the simple network and large network. Since the network sizes are smaller we are able to run the exhaustive search to find the global optimal. In all cases the results from genetic algorithm outperform greed algorithm and close to global optimal. Although it is not guaranteed to find the global optimal solution as the simple network did, a robust good solution is enough to solve real world problems.
PRECONDITIONING FOR MATRIX COMPUTATION
Year of Dissertation:
Preconditioning is a classical subject of numerical solution of linear systems of equations. The goal is to turn a linear system into another one which is easier to solve. The two central subjects of numerical matrix computations are LIN-SOLVE, that is, the solution of linear systems of equations and EIGEN-SOLVE, that is, the approximation of the eigenvalues and eigenvectors of a matrix. We focus on the former subject of LIN-SOLVE and show an application to EIGEN-SOLVE. We achieve our goal by applying randomized additive and multiplicative preconditioning. We facilitate the numerical solution by decreasing the condition of the coefficient matrix of the linear system, which enables reliable numerical solution of LIN-SOLVE. After the introduction in the Chapter 1 we recall the definitions and auxiliary results in Chapter 2. Then in Chapter 3 we precondition linear systems of equations solved at every iteration of the Inverse Power Method applied to EIGEN-SOLVE. These systems are ill conditioned, that is, have large condition numbers, and we decrease them by applying randomized additive preconditioning. This is our first subject. Our second subject is randomized multiplicative preconditioning for LIN-SOLVE. In this way we support application of GENP, that is, Gaussian elimination with no pivoting, and block Gaussian elimination. We prove that the proposed preconditioning methods are efficient when we apply Gaussian random matrices as preconditioners. We confirm these results with our extensive numerical tests. The tests also show that the same methods work as efficiently on the average when we use random structured, in particular circulant, preconditioners instead, but we show both formally and experimentally that these preconditioners fail in the case of LIN-SOLVE for the unitary matrix of discreet Fourier transform, for which Gaussian preconditioners work efficiently.