Alumni Dissertations

Filter Dissertations By:

 
 
  • Clustering Categorical Data Using Data Summaries and Spectral Techniques

    Author:
    Eman Abdu
    Year of Dissertation:
    2009
    Program:
    Computer Science
    Advisor:
    Bilal Khan
    Abstract:

    Cluster analysis is an active area of research with applications in various fields including information retrieval, social sciences, bioinformatics, object recognition, and image segmentation (Jain et al., 1999). However, most algorithms are intended for numerical (continuous) data where proximity among data objects is naturally defined by virtue of their numerical properties. Although these algorithms can be used on categorical data, they are not designed to handle data properties typically found in this data type such as high dimensionality and lack of inherent relationships among attribute values. During the past decade, several algorithms have been designed for categorical data such as K-modes (Huang, 1998), STIRR (Gibson et al., 1998), CACTUS (Ganti et al., 1999), ROCK (Guha et al., 1999), COOLCAT (Barbara et al., 2002), LIMBO (Andritsos et al., 2004), CLICKS (Zaki et al., 2007), and others. Some of these algorithms exploit attribute relationships through data summaries such as attributes occurrence and co-occurrence frequencies while others use information entropy and links among data objects. In this thesis, we focus on using data summaries and spectral analysis to detect clustering structure in categorical data. Spectral techniques provide a relaxed solution to the discrete clustering problem which has been shown to be NP-hard (Drineas et al., 2004). Formulating the clustering problem as a graph partitioning problem and then finding the minimum normalized cut leads to a solution based on eigenvectors of the similarity matrix (i.e. Laplacian matrix). Spectral methods have been used in various algorithms and have been shown to find non-linearly separable clusters. Equally important, spectral analysis encompasses techniques for handling high-dimensional data since input data is projected into a lower-dimensional space where all computation/comparisons can be performed. Our approach is to extend spectral techniques to data summaries which are relatively less expensive to compute than data object similarity matrix for very large data sets. Our goal is to combine the benefits of spectral analysis with the relative low cost of computing data summaries. We have developed three algorithms for clustering categorical data using data summaries. Two of them use spectral techniques. Our test results on standard data sets and synthetic data sets show that our algorithms are competitive with current spectral and non-spectral algorithms for categorical data. Our algorithms provide a solution to the categorical data clustering problem that produces quality clustering and is scalable to large data sets.

  • Performance Effects of Computer-Based Multitasking Behavior

    Author:
    Rachel Adler
    Year of Dissertation:
    2012
    Program:
    Computer Science
    Advisor:
    Raquel Benbunan-Fich
    Abstract:

    This research examines multitasking from the perspective of human-computer interaction (HCI). Multitasking is defined as the performance of multiple tasks concurrently. In a computer-based environment, users generally switch between multiple computer-based tasks either due to a personal decision to break from the current task or due to an external interruption, such as an electronic notification. This dissertation describes an in-depth empirical study, using a laboratory setting with different numeric, verbal, and visual computer-based tasks. Six hundred and thirty six subjects were randomly assigned into three conditions: discretionary multitasking, where participants were allowed to decide when and how often to switch tasks, forced multitasking, where participants were forced to switch tasks at certain allotted times, and non-multitasking, where participants performed the tasks sequentially and were not allowed to multitask. In order to investigate performance effectiveness (accuracy) and performance efficiency (productivity), participants' overall accuracy and productivity scores were compared across conditions. The results suggest that during difficult tasks, subjects who were forced to multitask had the lowest accuracy. In addition, those subjects in the forced multitasking condition who felt the primary task was difficult had lower accuracy than those who felt the task was easy. This was not true in the other two conditions. Receiving interruptions during a difficult task impacted not only their primary task, but their secondary tasks as well. In the discretionary multitasking condition, the more subjects decided to multitask, the lower their accuracy scores. In fact, an additional analysis revealed that high multitaskers not only performed worse than low and medium multitaskers in the discretionary condition, but actually had the worst performance than subjects in any other condition. Medium multitaskers, however, had the highest productivity scores. While multitasking in that case was considered the best in terms of efficiency, it was not true in terms of effectiveness. Therefore, discretionary multitasking gives the illusion of high performance. Furthermore, this study also explored why people chose to multitask and the impact that had on performance. The results of this study can assist HCI researchers in developing a more comprehensive understanding of user multitasking which can lead to better interface designs.

  • Some Non-Classical Methods in Epistemic Logic and Games

    Author:
    Can Baskent
    Year of Dissertation:
    2012
    Program:
    Computer Science
    Advisor:
    Rohit Parikh
    Abstract:

    In this dissertation, we consider some non-classical methods in epistemic logic and games. We first consider, dynamic epistemic logics in topological and geometric semantics, and then extend such ideas to the cases where inconsistencies are allowed. Then, as a case example, we discuss a well known paradox in game theory which is essentially a two-person Russell's paradox. Finally, we conclude with considering an alternative approach to games where strategies are considered as the primitives of the theory, and advancing some results.

  • ENHANCING THE PERFORMANCE OF ACTIVE CONNECTIONS IN MANETS THROUGH DYNAMIC ROUTE AND POWER OPTIMIZATION

    Author:
    Zeki Bilgin
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Bilal Khan
    Abstract:

    In this thesis, we consider two significant problems that occur within active connections in mobile ad hoc networks (MANETs). These are: (A) degradation of path optimality in terms of hop count, and (B) failures on the constituents links of a path. Both phenomena occur over time because of node movement. Our investigation considers

  • Conflict-free coloring

    Author:
    Panagiotis Cheilaris
    Year of Dissertation:
    2009
    Program:
    Computer Science
    Advisor:
    Stathis Zachos
    Abstract:

    Graph and hypergraph colorings constitute an important subject in

  • Efficient controls for finitely convergent sequential algorithms and their applications

    Author:
    Wei Chen
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Gabor Herman
    Abstract:

    Finding a feasible point that satisfies a set of constraints or a point that optimizes a function subject to a set of constraints is a common task in scientific computing: examples are the linear/convex feasibility/optimization problems. Projection methods have been used widely in solving many such problems of large size much more efficiently than other alternatives. Finitely convergent sequential algorithms are projection methods that sequentially iterate on individual constraints, one at a time, but overall find a feasible point in a finite number of iterations. ART3 is an example using cyclic control in the sense that it repeatedly cycles through the given constraints. The skipping unnecessary checks on constraints that are likely to be satisfied, lead to the new algorithm ART3+, a variant of ART3 whose control is no longer cyclic, but which is still finitely convergent. Experiments in fitting pixel images by blob images show that ART3+ is statistically significantly faster than ART3. Furthermore, a general methodology is proposed for automatic transformation of any finitely convergent sequential algorithm in such a way that (1) finite convergence is retained and (2) the speed of finite convergence is improved. The first property is proved by mathematical theorems, the second is illustrated by applying the algorithms to practical problems. This transformation is applicable, for example, to the finitely convergent modified cyclic subgradient projection algorithm for solving convex feasibility problems.

  • COLLABORATIVE RANKING AND COLLABORATIVE CLUSTERING

    Author:
    Zheng Chen
    Year of Dissertation:
    2013
    Program:
    Computer Science
    Advisor:
    Heng Ji
    Abstract:

    Ranking and clustering are two important problems in machine learning and have

  • The Bilinear Brain, Bilinear Methods For EEG Analysis And Brain Computer Interfaces.

    Author:
    Christoforos Christoforou
    Year of Dissertation:
    2009
    Program:
    Computer Science
    Advisor:
    Robert Haralick
    Abstract:

    Analysis of electro-encephalographic (EEG) signals has been proven an extremely useful research tool for studying the neural correlates of behavior. Single-trial analysis of these signals is essential for the development of non-invasive Brain Computer Interfaces. Analysis of these signals is often expressed as a single-trial classi¯cation problem. The goal is to infer the underling cognitive state

  • Hash Functions, Latin Squares and Secret Sharing Schemes

    Author:
    Chi Chum
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Xiaowen Zhang
    Abstract:

    A secret sharing scheme creates an effective method to safeguard a secret by dividing it among several participants. Since the original idea introduced by Shamir and Blakley in 1979, a variety of threshold secret sharing schemes and other types have been suggested by researchers. The first part of this thesis shows how to apply hash functions in secret sharing scheme designs. By using hash functions and the herding hashes technique, we first set up a (t + 1, n) threshold scheme which is perfect and ideal, and then extend it to schemes for any general access structure. The schemes can be further set up as verifiable if necessary. The secret can be quickly recovered due to the fast calculation of the hash function. In particular, secret sharing schemes based on Latin squares will be discussed.

  • Algorithms for Superiorization and their Applications to Image Reconstruction

    Author:
    Ran Davidi
    Year of Dissertation:
    2010
    Program:
    Computer Science
    Advisor:
    Gabor Herman
    Abstract:

    Computational tractability with limited computing resources is a major barrier