Filter Dissertations and Theses By:
Clustering Categorical Data Using Data Summaries and Spectral Techniques
Year of Dissertation:
Cluster analysis is an active area of research with applications in various fields including information retrieval, social sciences, bioinformatics, object recognition, and image segmentation (Jain et al., 1999). However, most algorithms are intended for numerical (continuous) data where proximity among data objects is naturally defined by virtue of their numerical properties. Although these algorithms can be used on categorical data, they are not designed to handle data properties typically found in this data type such as high dimensionality and lack of inherent relationships among attribute values. During the past decade, several algorithms have been designed for categorical data such as K-modes (Huang, 1998), STIRR (Gibson et al., 1998), CACTUS (Ganti et al., 1999), ROCK (Guha et al., 1999), COOLCAT (Barbara et al., 2002), LIMBO (Andritsos et al., 2004), CLICKS (Zaki et al., 2007), and others. Some of these algorithms exploit attribute relationships through data summaries such as attributes occurrence and co-occurrence frequencies while others use information entropy and links among data objects. In this thesis, we focus on using data summaries and spectral analysis to detect clustering structure in categorical data. Spectral techniques provide a relaxed solution to the discrete clustering problem which has been shown to be NP-hard (Drineas et al., 2004). Formulating the clustering problem as a graph partitioning problem and then finding the minimum normalized cut leads to a solution based on eigenvectors of the similarity matrix (i.e. Laplacian matrix). Spectral methods have been used in various algorithms and have been shown to find non-linearly separable clusters. Equally important, spectral analysis encompasses techniques for handling high-dimensional data since input data is projected into a lower-dimensional space where all computation/comparisons can be performed. Our approach is to extend spectral techniques to data summaries which are relatively less expensive to compute than data object similarity matrix for very large data sets. Our goal is to combine the benefits of spectral analysis with the relative low cost of computing data summaries. We have developed three algorithms for clustering categorical data using data summaries. Two of them use spectral techniques. Our test results on standard data sets and synthetic data sets show that our algorithms are competitive with current spectral and non-spectral algorithms for categorical data. Our algorithms provide a solution to the categorical data clustering problem that produces quality clustering and is scalable to large data sets.
Interactions and Complexity in Multi-Agent Justification Logic
Year of Dissertation:
We investigate the complexity of Justification Logic and Modal Logic when we allow multiple agents whose justifications affect each other – by including some combination of the axioms t :i φ → t :j φ and t :i φ → !t :j t :i φ (modal cases: [ ]i φ → [ ]j φ and [ ]i φ → [ ]j [ ]i φ ). We discover complexity jumps new for the field of Justification Logic: in addition to logics with their satisfiability problem in the second level of the polynomial hierarchy (as is the usual case until now), there are logics that have PSPACE-complete, EXP-complete and even NEXP-complete satisfiability problems. It is notable how the behavior of several of these justification logics mirrors the behavior of the corresponding multi-modal logics when we restrict modal formulas (in negation normal form) to use no diamonds. Thus we first study the complexity of such diamond-free modal logics and then we deduce complexity properties for the justification logic systems. On the other hand, it is similarly notable how certain lower complexity bounds – the NEXP-hardness bound and the general Sigma 2-hardness bound we present – are more dependent on the behavior of the justifications. The complexity results are interesting for Modal Logic as well, as we give hardness results that hold even for the diamond-free, 1-variable fragments of these multi-modal logics and then we determine the complexity of these logics in a general case.
Performance Effects of Computer-Based Multitasking Behavior
Year of Dissertation:
This research examines multitasking from the perspective of human-computer interaction (HCI). Multitasking is defined as the performance of multiple tasks concurrently. In a computer-based environment, users generally switch between multiple computer-based tasks either due to a personal decision to break from the current task or due to an external interruption, such as an electronic notification. This dissertation describes an in-depth empirical study, using a laboratory setting with different numeric, verbal, and visual computer-based tasks. Six hundred and thirty six subjects were randomly assigned into three conditions: discretionary multitasking, where participants were allowed to decide when and how often to switch tasks, forced multitasking, where participants were forced to switch tasks at certain allotted times, and non-multitasking, where participants performed the tasks sequentially and were not allowed to multitask. In order to investigate performance effectiveness (accuracy) and performance efficiency (productivity), participants' overall accuracy and productivity scores were compared across conditions. The results suggest that during difficult tasks, subjects who were forced to multitask had the lowest accuracy. In addition, those subjects in the forced multitasking condition who felt the primary task was difficult had lower accuracy than those who felt the task was easy. This was not true in the other two conditions. Receiving interruptions during a difficult task impacted not only their primary task, but their secondary tasks as well. In the discretionary multitasking condition, the more subjects decided to multitask, the lower their accuracy scores. In fact, an additional analysis revealed that high multitaskers not only performed worse than low and medium multitaskers in the discretionary condition, but actually had the worst performance than subjects in any other condition. Medium multitaskers, however, had the highest productivity scores. While multitasking in that case was considered the best in terms of efficiency, it was not true in terms of effectiveness. Therefore, discretionary multitasking gives the illusion of high performance. Furthermore, this study also explored why people chose to multitask and the impact that had on performance. The results of this study can assist HCI researchers in developing a more comprehensive understanding of user multitasking which can lead to better interface designs.
SCHEDULING AND RESOURCE ALLOCATION IN WIRELESS SENSOR NETWORKS
Year of Dissertation:
In computer science and telecommunications, wireless sensor networks are an active research area. Each sensor in a wireless sensor network has some pre-defined or on demand tasks such as collecting or disseminating data. Network resources, such as broadcast channels, number of sensors, power, battery life, etc., are limited. Hence, a schedule is required to optimally allocate network resources so as to maximize some profit or minimize some cost. This thesis focuses on scheduling problems in the wireless sensor networks environment. In particular, we study three scheduling problems in the wireless sensor networks: broadcast scheduling, sensor scheduling for area monitoring, and content distribution scheduling. For each problem the goal is to find efficient scheduling algorithms that have good approximation guarantees and perform well in practice.
Toward an Argumentation-based Dialogue framework for Human-Robot Collaboration
Year of Dissertation:
Successful human-robot collaboration with a common goal requires peer interaction in which humans and robots cooperate and complement each other's expertise. Formal human-robot dialogue in which there is peer interaction is still in its infancy, though. My research recognizes three aspects of human-robot collaboration that call for dialogue: responding to discovery, pre-empting failure, and recovering from failure. In these scenarios the partners need the ability to challenge, persuade, exchange and expand beliefs about a joint action in order to collaborate through dialogue. My research identifies three argumentation-based dialogues: a persuasion dialogue to resolve disagreement, an information-seeking dialogue to expand individual knowledge, and an inquiry dialogue to share knowledge. A theoretical logic-based framework, a formalized dialogue protocol based on argumentation theory, and argumentation-based dialogue games were developed to provide dialogue support for peer interaction. The work presented in this thesis is the first to apply argumentation theory and three different logic-based argumentation dialogues for use in human-robot collaboration. The research presented in this thesis demonstrates a practical, real-time implementation in which persuasion, inquiry, and information-seeking dialogues are applied to shared decision making for human-robot collaboration in a treasure hunt game domain. My research investigates if adding peer interaction enabled through argumentation-based dialogue to an HRI system improves system performance and user experience during a collaborative task when compared to an HRI system that is capable of only supervisory interaction with minimal dialogue. Results from user studies in physical and simulated human-robot collaborative environments, which involved 108 human participants who interacted with a robot as peer and supervisor, are presented in this thesis. My research contributes to both the human-robot interaction (HRI) and the argumentation communities. First, it brings into HRI a structured method for a robot to maintain its beliefs, to reason using those beliefs, and to interact with a human as a peer via argumentation-based dialogues. The structured method allows the human-robot collaborators to share beliefs, respond to discovery, expand beliefs to recover from failure, challenge beliefs, or resolve conflicts by persuasion. It allows a robot to challenge a human or a human to challenge a robot to prevent human or robot errors. Third, my research provides a comprehensive subjective and objective analysis of the effectiveness of an HRI System with peer interaction that is enabled through argumentation-based dialogue. I compare this peer interaction to a system that is capable of only supervisory interaction with minimal dialogue. My research contributes to the harder questions for human-robot collaboration: what kind of human-robot dialogue support can enhance peer-interaction? How can we develop models to formalize those features? How can we ensure that those features really help, and how do they help? Human-robot dialogue that can aid shared decision making, support the expansion of individual or shared knowledge, and resolve disagreements between collaborative human-robot teams will be much sought after as human society transitions from a world of robot-as-a-tool to robot-as-a-partner. My research presents a version of peer interaction enabled through argumentation-based dialogue that allows humans and robots to work together as partners.
Some Non-Classical Methods in Epistemic Logic and Games
Year of Dissertation:
In this dissertation, we consider some non-classical methods in epistemic logic and games. We first consider, dynamic epistemic logics in topological and geometric semantics, and then extend such ideas to the cases where inconsistencies are allowed. Then, as a case example, we discuss a well known paradox in game theory which is essentially a two-person Russell's paradox. Finally, we conclude with considering an alternative approach to games where strategies are considered as the primitives of the theory, and advancing some results.
ENHANCING THE PERFORMANCE OF ACTIVE CONNECTIONS IN MANETS THROUGH DYNAMIC ROUTE AND POWER OPTIMIZATION
Year of Dissertation:
In this thesis, we consider two significant problems that occur within active connections in mobile ad hoc networks (MANETs). These are: (A) degradation of path optimality in terms of hop count, and (B) failures on the constituents links of a path. Both phenomena occur over time because of node movement. Our investigation considers what can be done to minimize their occurrence of both, after the problem of initial route selection has been resolved by standard MANET routing protocols. In developing solutions to the aforementioned problems, we identified two broad and complementary approaches: (i) Variable topology, fixed power: These approaches assume that the transmission power of the nodes is kept fixed, but the topology of the connections is modifiable during their lifetimes. (ii) Variable power, fixed topology: These approaches assume that the topological structure of the connection must be kept fixed, but the transmission power levels used by constituent nodes is adjustable. Within approach (i), we developed (A) two new route optimization schemes that seek to shorten path lengths by eliminating inessential hops "on-the-fly", without relying on promiscuous mode of wireless cards, and (B) two new route maintenance schemes that circumvent impending link failures and heal broken links in an efficient way. We implemented our schemes in the ns2 packet level network simulator, as extension to the Ad hoc On Demand Distance Vector (AODV) routing protocol. Through extensive simulations, we show that our schemes are able to optimize path lengths, increase connection lifetime, reduce overall control traffic overhead, decrease end-to-end delay, and provide energy savings in packet transmissions. Within approach (ii), we developed (B) several new dynamic power budget distribution schemes. These were evaluated using a new model in which each connection is assigned a fixed power budget, and seeks to distribute this budget among its constituent nodes so as to increase the connection's lifetime. We implemented our schemes as a discrete event simulation. Through extensive simulation experiments, we showed that our schemes are able to consistently improve connection lifetimes without excessive additional control traffic overhead. The conclusions of both studies are seen to hold scalably as one varies situational parameters such as network size, number of connections, and node mobility levels.
PRIVACY-PRESERVING QUERY PROCESSING ON TEXT DOCUMENTS
Year of Dissertation:
Privacy-preserving query processing is an essential component for data processing, especially in outsourced databases, or in data operations which have special security and privacy requirements such as sharing of sensitive data. While cloud computing and data outsourcing attract an increasing number of customers, the security and privacy of sensitive data still remains an open problem. Encryption secures the data against unauthorized access, but it does not provide the ability to query the data unless the encryption scheme is searchable. Searchable encryption can be either private or public key depending on the needs of the user. In general, private-key solutions are faster but suffer from a key management problem. On the other hand, public-key solutions provide more flexibility but their running times are much higher than private-key protocols. Furthermore, parties may sometimes be forced to share data in order to comply with regulations or agreements. For example, different health care companies or intelligence agencies may need to find whether they have similar records in their databases without compromising privacy. Consequently, privacy-preserving similarity search between text documents is an emerging field as sensitive data sharing becomes inevitable. In this dissertation we present two privacy-preserving text processing protocols: (i) a ranked keyword search mechanism over outsourced public-key encrypted data and (ii) a similar document detection system. We introduce efficient algorithms for answering these query types and illustrate their feasibility in real-life applications.
Data Monitoring and Analysis in Wireless Networks
Year of Dissertation:
Various wireless network technologies have been created to meet the ever-increasing demand for wireless access to the Internet, such as wireless local area network, cellular network, sensor network and many more. The communication devices have transformed from large computational servers to small wireless hand-held devices, ranging from laptops, tablets, smartphones to small sensors. The advances of these wireless networks (e.g., faster network speed) and their intensive usages result in an enormous growth of network data in terms of volume, diversity, and complexity. All of these changes have raised complicated issues of network measurement and management. In the first part of this thesis, I study how WiFi network characteristics impact network forensics investigation and home security monitoring. I first focus on network forensics investigation and propose a wireless forensic monitoring system to collect trace digests of WiFi activities and facilitate cybercrime investigation. Then, I design and develop a low-cost home security system based on WiFi networks for physical intruder detection. Two methods - MAC-based detection and RSSI-variance-based detection, are proposed based on the characteristics of WiFi networks. In the second part, I study how to effectively and efficiently model multiple coevolving time series, which is ubiquitous in network measurement especially in wireless sensor networks. Two comprehensive algorithms are proposed to address three prominent challenges of mining coevolving sensor measured traces: (a) high order; (b) contextual constraints; and (c) temporal smoothness.
Quadratic Discriminant Analysis Revisited
Year of Dissertation:
In this thesis, we revisit quadratic discriminant analysis (QDA), a standard classification method. Specifically, we investigate the parameter estimation and dimension reduction problems for QDA. Traditionally, the parameters of QDA are estimated generatively; that is the parameters are estimated by maximizing the joint likelihood of observations and their labels. In practice, classical QDA, though computationally efficient, often underperforms discriminative classifiers, such as SVM, Boosting methods, and logistic regression. Motivated by recent research on hybrid generative/discriminative learning, we propose to estimate the parameters of QDA by minimizing a convex combination of negative joint log-likelihood and negative conditional log-likelihood of observations and their labels. For this purpose, we propose an iterative majorize-minimize (MM) algorithm for classifiers of which conditional distributions are from the exponential family; in each iteration of the MM algorithm, a convex optimization problem needs to be solved. To solve the convex problem specially derived for QDA, we propose a block-coordinate descent algorithm that sequentially updates the parameters of QDA; in each update, we present a trust region method for solving optimal estimations, of which we have closed form solutions in each iteration. Numerical experiments show: 1) the hybrid approach to QDA is competitive with, and in some cases significant better than other approaches to QDA, SVM with polynomial kernel ($d=2$) and logistic regression with linear and quadratic features; 2) in many cases, our optimization method converges faster to equal or better optimums than the conjugate gradient method used in the literature. Dimension reduction methods are commonly used to extract more compact features in the hope to build more efficient and possibly more robust classifiers. It is well known that Fisher's discriminant analysis generates optimal lower dimensional features for linear discriminant analysis. However, ``ldots for QDA, where so far there has been no universally accepted dimension-reduction technique in the literature'', though considerable efforts have been made. To construct a dimension reduction method for QDA, we generalize the Fukunaga-Koontz transformation, and propose novel affine feature extraction (AFE) methods for binary QDA. The proposed AFE methods have closed-form solutions and thus can be solved efficiently. We show that 1) the AFE methods have desired geometrical, statistical and information-theoretical properties; and 2) the AFE methods generalize dimension reduction methods for LDA and QDA with equal means. Numerical experiments show that the new proposed AFE method is competitive with, and in some cases significantly better than some commonly used linear dimension reduction techniques for QDA in the literature.