Qiang Zhu
Department of Computer and Information Science
The
University of Michigan
Qiang Zhu
Department of Computer and Information Science
The
University of Michigan - Dearborn
Dearborn, MI 48128-1491
Phone: (313)
593-4998
Fax : (313) 593-9967
Email: qzhu@umich.edu
http://www.engin.umd.umich.edu/~qzhu/nsf9811980/nsf98.html
Multidatabase system, global query optimization, dynamic cost model, query sampling, statistical methods, query execution strategy
A crucial challenge for global query optimization in a multidatabase system (MDBS) is that some required local information such as local cost models may not be available at the global level due to local autonomy of component database systems. A number of techniques to tackle this challenge have been suggested in the literature recently. However, they are suitable only for a static system environment. This research project explores theory and methods to establish dynamic cost models for an MDBS. Previous work on a query sampling method for establishing cost models in a static system environment is extended to a dynamic system environment. The methodologies adopted in this project include a qualitative approach, which introduces qualitative variables into cost models, and an adaptive approach, which adaptively incorporates dynamic cost information observed from execution of user queries into cost models. All approaches are evaluated and compared in order to determine the most promising one to establish dynamic cost models for multidatabase systems. Theoretical and experimental studies on query optimization that takes advantage of dynamic cost models are also conducted. The project will provide theory, algorithms, and implementation techniques for the design and development of an efficient multidatabase system.
The goal of this project is to study how to establish dynamic cost models for a multidatabase system so that a dynamically-changing system environment can be captured. It is expected that the quality of query optimization can be improved by using dynamic cost models instead of traditional static cost models. The two main objectives of this project are therefore (1) to develop theory and methods to build dynamic cost models for an MDBS and (2) to explore techniques to perform query optimization based on dynamic cost models in an MDBS.
In previous work, we developed a query sampling method to derive local cost models for global query optimization in a multidatabase system. The key idea is to group local queries that can be performed on a component database system into homogeneous classes based on limited information available at the global level. A sample of queries are then drawn from each query class and run against the user component database. The costs of sample queries are recorded and used to derive a cost model for each query class by multiple regression. The parameters of the derived cost models are kept in the MDBS catalog and utilized during global query optimization. Experimental results demonstrated that the query sampling method could derive good cost models for a static multidatabase environment.
It has been observed that many factors in a multidatabase environment may change dramatically over time and may have a significant impact on query costs. To establish dynamic cost models for an MDBS, we have developed a so-called qualitative approach in this project. The system contention level in a dynamic environment is divided into a number of discrete contention states based on the costs of a probing query. Algorithms based on iterative uniform partition and data clustering have been developed to determine proper system contention states for a dynamic environment. A qualitative variable is used to indicate system contention states. The techniques from our previous static query sampling method, including query sampling, automatic variable selection, regression analysis, and model validation, are extended so as to develop a cost model incorporating the qualitative variable for a dynamic environment. Experimental results have demonstrated that the qualitative approach is quite promising. To estimate the costs of large queries experiencing multiple system contention states, we are currently investigating two techniques, namely, fractional analysis and probabilistic approach. The former, which is suitable for a system environment that changes its contention states gradually and smoothly, estimates a query cost by analyzing its different fractions. The latter, which is suitable for a system environment that changes its contention states rapidly and randomly, utilizes the theory of Markov chains to estimate a query cost. The other ongoing activities include improving our dynamic environments generator and cost model builder, studying the correlation between query cost and different dynamic factors, investigating several other potential approaches to establishing dynamic cost models for an MDBS, including an adaptive approach and a neural network approach. The multidatabase query optimization strategies that take advantage of dynamic cost models will also be investigated in this project.
A multidatabase system integrates data from multiple pre-existing databases managed by heterogeneous component database management systems (DBMS) in a distributed environment. A key feature of an MDBS is local autonomy that individual DBMSs retain to serve existing applications. A user can issue global queries to retrieve data from multiple component databases without knowing where and how the data is located. To achieve good overall system performance, global query optimization is required. Although some traditional distributed query optimization techniques can be applied to an MDBS, local autonomy and heterogeneity raise new challenges for global query optimization in such a system. To tackle the major challenge that local cost information is lacking at the global level in an MDBS, a number of techniques including the calibration method, query sampling method, cost vector database, generic cost model with adjustments and fuzzy approach have been suggested in the literature. Other multidatabase query optimization techniques such as query modification and decomposition, parallel execution, entity join processing, join tree balance, schema-level optimization, semantic query optimization and outerjoin optimization to deal with different challenges have also been studied by researchers in the area.
Within the IDM program, two related projects are "Interoperable Query Processing with Networked Heterogeneous Information Servers" (PI: L. Raschid) and "Querying Heterogeneous and Multimedia Information Sources" (PI: Y. Papakonstantinou). We are interested in collaborating and exchanging ideas with colleagues in this area.