Mason Archival Repository Service

Query Consolidation: Interpreting Queries Sent to Independent Heterogenous Databases

Show simple item record

dc.contributor.author Acar, Aybar C.
dc.creator Acar, Aybar C.
dc.date 2008-07-23
dc.date.accessioned 2008-08-14T17:39:17Z
dc.date.available NO_RESTRICTION en
dc.date.available 2008-08-14T17:39:17Z
dc.date.issued 2008-08-14T17:39:17Z
dc.identifier.uri https://hdl.handle.net/1920/3223
dc.description.abstract This dissertation introduces the problem of query consolidation, which seeks to interpret a set of disparate queries submitted to independent databases with a single “global” query. The problem has multiple applications, from improving virtual database design, to aiding users in information retrieval, to protecting against inference of sensitive data from a seemingly innocuous set of apparently unrelated queries. The problem exhibits attractive duality with the much-researched problem of query decomposition, which has been addressed intensively in the context of multidatabase environments: How to decompose a query submitted to a virtual database into a set of local queries that are evaluated in individual databases. The new problem is set in the architecture of a canonical multidatabase system, using it in the “reverse” direction. The reversal is built on the assumption of conjunctive queries and source descriptions. A rational and efficient query decomposition strategy is also assumed, and this decomposition is reversed to arrive at the original query by analyzing the decomposed components The process incorporates several steps where a number of solutions must be considered, due to the fact that query decomposition is not injective. Initially, the problem of finding the most likely join plan between component queries is investigated. This is accomplished by leveraging the referential constraints available in the underlying multidatabase, or by approximating these constraints from the data when not available. This approximation is done using the information theoretic concept of conditional entropy. Furthermore, the most likely join plans are enhanced by the expansion of their projections and adding precision to their selection constraints by estimating the selection constraints that would be applied to these consolidations offline. Additionally, the extraction of a set of queries related to the same retrieval task from an ongoing sequence of incoming queries is investigated. A conditional random field model is trained to segment and label incoming query sequences. Finally, the candidate consolidations are re-encapsulated with a genetic programming approach to find simpler intentional descriptions that are extensionally equivalent to discover the original intent of the query. The dissertation explains and discusses all of the above operations and validates the methods developed with experimentation on synthesized and real-world data. The results are highly encouraging and verify that the accuracy, time performance, and scalability of the methods would make it possible to exploit query consolidation in production environments.
dc.language.iso en_US en
dc.subject Databases en_US
dc.subject Information Integration en_US
dc.subject Query Processing en_US
dc.subject Machine Learning en_US
dc.title Query Consolidation: Interpreting Queries Sent to Independent Heterogenous Databases en
dc.type Dissertation en
thesis.degree.name Doctor of Philosophy in Computer Science en
thesis.degree.level Doctoral en
thesis.degree.discipline Computer Science en
thesis.degree.grantor George Mason University en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics