Clustering Categorical Data Using Data Summaries and Spectral Techniques
Year of Dissertation:
2009
Cluster analysis is an active area of research with applications in various fields including information retrieval, social sciences, bioinformatics, object recognition, and image segmentation (Jain et al., 1999). However, most algorithms are intended for numerical (continuous) data where proximity among data objects is naturally defined by virtue of their numerical properties. Although these algorithms can be used on categorical data, they are not designed to handle data properties typically found in this data type such as high dimensionality and lack of inherent relationships among attribute values. During the past decade, several algorithms have been designed for categorical data such as K-modes (Huang, 1998), STIRR (Gibson et al., 1998), CACTUS (Ganti et al., 1999), ROCK (Guha et al., 1999), COOLCAT (Barbara et al., 2002), LIMBO (Andritsos et al., 2004), CLICKS (Zaki et al., 2007), and others. Some of these algorithms exploit attribute relationships through data summaries such as attributes occurrence and co-occurrence frequencies while others use information entropy and links among data objects. In this thesis, we focus on using data summaries and spectral analysis to detect clustering structure in categorical data. Spectral techniques provide a relaxed solution to the discrete clustering problem which has been shown to be NP-hard (Drineas et al., 2004). Formulating the clustering problem as a graph partitioning problem and then finding the minimum normalized cut leads to a solution based on eigenvectors of the similarity matrix (i.e. Laplacian matrix). Spectral methods have been used in various algorithms and have been shown to find non-linearly separable clusters. Equally important, spectral analysis encompasses techniques for handling high-dimensional data since input data is projected into a lower-dimensional space where all computation/comparisons can be performed. Our approach is to extend spectral techniques to data summaries which are relatively less expensive to compute than data object similarity matrix for very large data sets. Our goal is to combine the benefits of spectral analysis with the relative low cost of computing data summaries. We have developed three algorithms for clustering categorical data using data summaries. Two of them use spectral techniques. Our test results on standard data sets and synthetic data sets show that our algorithms are competitive with current spectral and non-spectral algorithms for categorical data. Our algorithms provide a solution to the categorical data clustering problem that produces quality clustering and is scalable to large data sets.
Performance Effects of Computer-Based Multitasking Behavior
Year of Dissertation:
2012
Advisor:
Raquel Benbunan-Fich
This research examines multitasking from the perspective of human-computer interaction (HCI). Multitasking is defined as the performance of multiple tasks concurrently. In a computer-based environment, users generally switch between multiple computer-based tasks either due to a personal decision to break from the current task or due to an external interruption, such as an electronic notification. This dissertation describes an in-depth empirical study, using a laboratory setting with different numeric, verbal, and visual computer-based tasks. Six hundred and thirty six subjects were randomly assigned into three conditions: discretionary multitasking, where participants were allowed to decide when and how often to switch tasks, forced multitasking, where participants were forced to switch tasks at certain allotted times, and non-multitasking, where participants performed the tasks sequentially and were not allowed to multitask. In order to investigate performance effectiveness (accuracy) and performance efficiency (productivity), participants' overall accuracy and productivity scores were compared across conditions. The results suggest that during difficult tasks, subjects who were forced to multitask had the lowest accuracy. In addition, those subjects in the forced multitasking condition who felt the primary task was difficult had lower accuracy than those who felt the task was easy. This was not true in the other two conditions. Receiving interruptions during a difficult task impacted not only their primary task, but their secondary tasks as well. In the discretionary multitasking condition, the more subjects decided to multitask, the lower their accuracy scores. In fact, an additional analysis revealed that high multitaskers not only performed worse than low and medium multitaskers in the discretionary condition, but actually had the worst performance than subjects in any other condition. Medium multitaskers, however, had the highest productivity scores. While multitasking in that case was considered the best in terms of efficiency, it was not true in terms of effectiveness. Therefore, discretionary multitasking gives the illusion of high performance. Furthermore, this study also explored why people chose to multitask and the impact that had on performance. The results of this study can assist HCI researchers in developing a more comprehensive understanding of user multitasking which can lead to better interface designs.
Some Non-Classical Methods in Epistemic Logic and Games
Year of Dissertation:
2012
In this dissertation, we consider some non-classical methods in epistemic logic and games. We first consider, dynamic epistemic logics in topological and geometric semantics, and then extend such ideas to the cases where inconsistencies are allowed. Then, as a case example, we discuss a well known paradox in game theory which is essentially a two-person Russell's paradox. Finally, we conclude with considering an alternative approach to games where strategies are considered as the primitives of the theory, and advancing some results.
ENHANCING THE PERFORMANCE OF ACTIVE CONNECTIONS IN MANETS THROUGH DYNAMIC ROUTE AND POWER OPTIMIZATION
Year of Dissertation:
2010
In this thesis, we consider two significant problems that occur within active connections in mobile ad hoc networks (MANETs). These are: (A) degradation of path optimality in terms of hop count, and (B) failures on the constituents links of a path. Both phenomena occur over time because of node movement. Our investigation considers
Conflict-free coloring
Author:
Panagiotis Cheilaris
Year of Dissertation:
2009
Graph and hypergraph colorings constitute an important subject in
Efficient controls for finitely convergent sequential algorithms and their applications
Year of Dissertation:
2010
Finding a feasible point that satisfies a set of constraints or a point that optimizes a function subject to a set of constraints is a common task in scientific computing: examples are the linear/convex feasibility/optimization problems. Projection methods have been used widely in solving many such problems of large size much more efficiently than other alternatives. Finitely convergent sequential algorithms are projection methods that sequentially iterate on individual constraints, one at a time, but overall find a feasible point in a finite number of iterations. ART3 is an example using cyclic control in the sense that it repeatedly cycles through the given constraints. The skipping unnecessary checks on constraints that are likely to be satisfied, lead to the new algorithm ART3+, a variant of ART3 whose control is no longer cyclic, but which is still finitely convergent. Experiments in fitting pixel images by blob images show that ART3+ is statistically significantly faster than ART3. Furthermore, a general methodology is proposed for automatic transformation of any finitely convergent sequential algorithm in such a way that (1) finite convergence is retained and (2) the speed of finite convergence is improved. The first property is proved by mathematical theorems, the second is illustrated by applying the algorithms to practical problems. This transformation is applicable, for example, to the finitely convergent modified cyclic subgradient projection algorithm for solving convex feasibility problems.
COLLABORATIVE RANKING AND COLLABORATIVE CLUSTERING
Year of Dissertation:
2013
Ranking and clustering are two important problems in machine learning and have
The Bilinear Brain, Bilinear Methods For EEG Analysis And Brain Computer Interfaces.
Author:
Christoforos Christoforou
Year of Dissertation:
2009
Analysis of electro-encephalographic (EEG) signals has been proven an extremely useful research tool for studying the neural correlates of behavior. Single-trial analysis of these signals is essential for the development of non-invasive Brain Computer Interfaces. Analysis of these signals is often expressed as a single-trial classi¯cation problem. The goal is to infer the underling cognitive state
Hash Functions, Latin Squares and Secret Sharing Schemes
Year of Dissertation:
2010
A secret sharing scheme creates an effective method to safeguard a secret by dividing it among several participants. Since the original idea introduced by Shamir and Blakley in 1979, a variety of threshold secret sharing schemes and other types have been suggested by researchers. The first part of this thesis shows how to apply hash functions in secret sharing scheme designs. By using hash functions and the herding hashes technique, we first set up a (t + 1, n) threshold scheme which is perfect and ideal, and then extend it to schemes for any general access structure. The schemes can be further set up as verifiable if necessary. The secret can be quickly recovered due to the fast calculation of the hash function. In particular, secret sharing schemes based on Latin squares will be discussed.
Algorithms for Superiorization and their Applications to Image Reconstruction
Year of Dissertation:
2010
Computational tractability with limited computing resources is a major barrier