Research

RESEARCH

My research primarily focuses on two areas of database systems: optimization of complex queries and database support for scientific data.

Future database applications pose several new challenges to query optimization. The complexity of queries asked will be significantly higher than in traditional systems. The number of alternative evaluation algorithms will be much higher as well, especially with the use of parallelism or with attempts to optimize for several values of run-time parameters (parametric query optimization). Thus, the number of alternative access plan for processing a query will be extremely large, so that the currently used algorithms for finding the optimum among them will be inadequate. My research investigates the use of randomized optimization algorithms as a viable solution to this problem. I am primarily interested in simulated annealing and genetic algorithms, as well as other alternatives that take advantage of special properties of query optimization. I am also looking into complex query scheduling problems, especially those that arise in parallel and multimedia environments. Error propagation of size and cost estimates in complex queries are also part of my studies, where I am trying to identify the appropriate information that must be maintained by a database system to limit the propagation of error. To that end, I'm primarily focusing on identifying the properties of optimal histograms that approximate the distribution of values in relation attributes.

The computational mode of investigation is expected to be part of many experiments in various scientific disciplines in the future. The databases to be generated need specialized support on many aspects that current technology is not ready to provide. I am involved in the development of the ZOO Desktop Experiment Management Environment that will help scientists throughout the life cycle of their experimental studies. A primary component of that system will be a database system. Two major issues that my work addresses are visual user interfaces and semantic heterogeneity. In the former, I'm concentrating on identifying what the right metaphors are for representing complex database schemas, queries, and objects to scientists so that they are natural to them, and also on investigating the power of dynamic visual queries. In the latter, I'm concentrating on developing visual tools that will facilitate translation and integration of different data formats or schemas. Although these issues are generic and arise in all experimental scientific disciplines, my efforts are guided by the needs of specific projects with which I am associated, in particular, simulation-based performance studies of computer systems, simulation-based modeling of plant growth, NMR spectroscopy, DNA sequencing, and microscopic imaging.