Establishing Dynamic Cost Models for Multidatabase Systems

Qiang Zhu
Department of Computer and Information Science
The University of Michigan

 
Contact Information

Qiang Zhu
Department of Computer and Information Science
The University of Michigan - Dearborn
Dearborn, MI 48128-1491
Phone: (313) 593-4998
Fax : (313) 593-9967
Email: qzhu@umich.edu

 
WWW PAGE

http://www.engin.umd.umich.edu/~qzhu/nsf9811980/nsf98.html

 
Project Award Information
 
Keywords

Multidatabase system, global query optimization, dynamic cost model, query sampling, statistical methods, query execution strategy

 
Project Summary

A crucial challenge for global query optimization in a multidatabase system (MDBS) is that some required local information such as local cost models may not be available at the global level due to local autonomy of component database systems. A number of techniques to tackle this challenge have been suggested in the literature recently. However, they are suitable only for a static system environment. This research project explores theory and methods to establish dynamic cost models for an MDBS. Previous work on a query sampling method for establishing cost models in a static system environment is extended to a dynamic system environment. The methodologies adopted in this project include a qualitative approach, which introduces qualitative variables into cost models, and an adaptive approach, which adaptively incorporates dynamic cost information observed from execution of user queries into cost models. All approaches are evaluated and compared in order to determine the most promising one to establish dynamic cost models for multidatabase systems. Theoretical and experimental studies on query optimization that takes advantage of dynamic cost models are also conducted. The project will provide theory, algorithms, and implementation techniques for the design and development of an efficient multidatabase system.

 
Publications and Products
 
Project Impact
 
Goals, Objectives, and Targeted Activities

The goal of this project is to study how to establish dynamic cost models for a multidatabase system so that a dynamically-changing system environment can be captured. It is expected that the quality of query optimization can be improved by using dynamic cost models instead of traditional static cost models. The two main objectives of this project are therefore (1) to develop theory and methods to build dynamic cost models for an MDBS and (2) to explore techniques to perform query optimization based on dynamic cost models in an MDBS.

In previous work, we developed a query sampling method to derive local cost models for global query optimization in a multidatabase system. The key idea is to group local queries that can be performed on a component database system into homogeneous classes based on limited information available at the global level. A sample of queries are then drawn from each query class and run against the user component database. The costs of sample queries are recorded and used to derive a cost model for each query class by multiple regression. The parameters of the derived cost models are kept in the MDBS catalog and utilized during global query optimization. Experimental results demonstrated that the query sampling method could derive good cost models for a static multidatabase environment.

It has been observed that many factors in a multidatabase environment may change dramatically over time and may have a significant impact on query costs. To establish dynamic cost models for an MDBS, we have developed a so-called qualitative approach in this project. The system contention level in a dynamic environment is divided into a number of discrete contention states based on the costs of a probing query. Algorithms based on iterative uniform partition and data clustering have been developed to determine proper system contention states for a dynamic environment. A qualitative variable is used to indicate system contention states. The techniques from our previous static query sampling method, including query sampling, automatic variable selection, regression analysis, and model validation, are extended so as to develop a cost model incorporating the qualitative variable for a dynamic environment. Experimental results have demonstrated that the qualitative approach is quite promising. To estimate the costs of large queries experiencing multiple system contention states, we are currently investigating two techniques, namely, fractional analysis and probabilistic approach. The former, which is suitable for a system environment that changes its contention states gradually and smoothly, estimates a query cost by analyzing its different fractions. The latter, which is suitable for a system environment that changes its contention states rapidly and randomly, utilizes the theory of Markov chains to estimate a query cost. The other ongoing activities include improving our dynamic environments generator and cost model builder,  studying the correlation between query cost and different dynamic factors, investigating several other potential approaches to establishing dynamic cost models for an MDBS, including an adaptive approach and a neural network approach. The multidatabase query optimization strategies that take advantage of dynamic cost models will also be investigated in this project.

 
Project References
  1. Qiang Zhu, Yu Sun and S. Motheram, Developing Cost Models with Qualitative Variables for Dynamic Multidatabase Environments, Proc. of 16th IEEE Int'l Conf. on Data Eng. (ICDE'00), pp 413 - 424, 2000
  2. Qiang Zhu and P.-A. Larson, Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems, Distributed and Parallel Databases, Vol. 6, No. 4, pp 373-420, 1998
  3. Y. Chen, Qiang Zhu and N. Wang, Query Processing with Quality Control in the World Wide Web, World Wide Web, Vol.1, No. 4, pp 241-55, 1998
  4. Qiang Zhu and P.-A. Larson, A Fuzzy Query Optimization Approach for Multidatabase Systems, International Journal of Uncertainty, Fuzziness and Knowledge-Based System, Vol.5, No.6, pp 701-22, 1997
  5. Qiang Zhu and P.-A. Larson, Building Regression Cost Models for Multidatabase Systems, Proceedings of the 4th IEEE Int'l Conf. on Parallel and Distributed Information Systems (PDIS'96), pp 220-31, 1996
  6. Qiang Zhu and P.-A. Larson, Global Query Processing and Optimization in the CORDS Multidatabase System, Proc. of 9th Int'l Conf. on Parallel and Distributed Computing Systems (PDCS'96), pp 640-46, 1996
  7. G.K. Attaluri, D.P. Bradshaw, N. Coburn, P.A. Larson, P. Martin, A. Silberschatz, J. Slonim, Qiang Zhu, The CORDS Multidatabase Project, IBM Systems Journal, Vol. 34, No. 1, pp 39-62, 1995
  8. Qiang Zhu and P.-A. Larson, A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System, Proc. of 10th IEEE Int'l Conf. on Data Eng. (ICDE'94), pp 144-53, 1994
 
Area Background

A multidatabase system integrates data from multiple pre-existing databases managed by heterogeneous component database management systems (DBMS) in a distributed environment. A key feature of an MDBS is local autonomy that individual DBMSs retain to serve existing applications. A  user can issue global queries to retrieve data from multiple component databases without knowing where and how the data is located. To achieve good overall system performance, global query optimization is required. Although some traditional distributed query optimization techniques can be applied to an MDBS, local autonomy and heterogeneity raise new challenges for global query optimization in such a system. To tackle the major challenge that local cost information is lacking at the global level in an MDBS, a number of techniques including the calibration method, query sampling method, cost vector database, generic cost model with adjustments and fuzzy approach have been suggested in the literature. Other multidatabase query optimization techniques such as query modification and decomposition, parallel execution, entity join processing, join tree balance, schema-level optimization, semantic query optimization and outerjoin optimization to deal with different challenges have also been studied by researchers in the area.

 
Area References
  1. M. T. Roth, F. Ozcan & L. M. Haas, Cost models DO matter: providing cost information for diverse data sources in a federated system, Proc. of 25th VLDB Conference, pp 599-610, 1999
  2. H. Naacke, G. Gardarin & A. Tomasic, Leveraging mediator cost models with heterogeneous data sources, Proc. of 14th Int'l Conf. on Data Eng., pp 351-60, 1998
  3. C. Lee & C. J. Chen, Query optimization in multidatabase systems considering schema conflicts, IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 6, pp 941-55, 1998
  4. G. Gardarin, F. Sha & Z.-H. Tang, Calibrating the query optimizer cost model of IRO-DB, an object-oriented federated database system, Proc. of 22nd VLDB Conference, pp 378-389, 1996
  5. S. Adali, K. S. Candan, Y. Papakonstantinou & V. S. Subrahmanian, Query caching and optimization in distributed mediator systems, Proc. of SIGMOD, pp 137-148, 1996
  6. W. Du, R. Krishnamurthy & M. C. Shan, Query optimization in heterogeneous DBMS, Proc. of 18th VLDB Conference, pp 277-91, 1992
 
Potential Related Projects

Within the IDM program, two related projects are "Interoperable Query Processing with Networked Heterogeneous Information Servers" (PI: L. Raschid) and "Querying Heterogeneous and Multimedia Information Sources" (PI: Y. Papakonstantinou). We are interested in collaborating and exchanging ideas with colleagues in this area.