Wolniewicz.com

Wolniewicz family experiences

Summary of Thesis Research

My thesis research investigated the application of extensible database technology, particularly query processing and optimization, to the requirements of scientific database systems. The research was conducted with funding from the National Science Foundation (NSF), and was a joint project between the University of Colorado's Space Grant College (SGC) and the Computer Science Department. In addition to the research results and papers, including a paper presented at the 1993 VLDB conference in Dublin, Ireland, the project included the construction of a prototype scientific database system with database analysis operators appropriate for supporting space science investigations.

The research was directed by 2 Co-Principle Investigators: Goetz Graefe, on the faculty at the University of Colorado Computer Science Department; and Elaine Hansen, director of the University of Colorado's Space Grant College. I was assisted in the research by the staff and students at SGC, including Jack Faber, Tony Colaprete, Jennifer Ray and Dan Rodier, The work itself built on Goetz Graefe's Volcano System and the extensible query optimizer component built by Bill McKenna, and received input from other graduate students working on Volcano, including Rick Cole and Diane Davison.

Abstract

Although scientific data analysis increasingly requires access to and manipulation of large quantities of data, current database technology fails to meet the needs of scientific processing. Shortcomings include data modeling facilities for scientific data types, physical storage structures for these types, and scientific analysis operations on data objects. Database systems for scientific users must address these shortcomings.

A database system can offer numerous functionality improvements over the current combinations of scientific programs and file systems commonly used in scientific data analysis. Unfortunately, the inclusion of a database layer between the application and the file system holding the application's data can result in degraded performance. To overcome acceptance problems among scientists, scientific databases must provide performance comparable to, and functionality superior to, current systems used by scientists.

Algebraic query optimization is one of many techniques used within database systems to improve performance. This technique has not been explored for scientific data types and operations. I have proposed expanding the concept of a database query to include numeric computations over scientific databases, thereby allowing algebraic query optimization to be applied to the full scientific computation and data access operations.

This research introduces an integrated algebra that includes traditional database operators for pattern matching and search as well as numeric operators for scientific analysis. The use of a single integrated algebra enables automatic optimization of computations, realizing all of the benefits provided by optimization in traditional database systems.

To experiment with this integrated algebra, a prototype system has been implemented for use at the University of Colorado's Space Grant College. The prototype supports many basic scientific operations such as interpolation and digital filtering, in addition to standard relational operations. I identify a set of transformation rules for this algebra, and show that these transformations can be used to achieve significant performance improvements.

The results from the prototype demonstrate that scientific database computations can be effectively optimized and permit performance gains that could not be realized without the integration of scientific operators into database systems. These results suggest that future scientific database systems will be expected to be based on integrated retrieval and computational algebras.

Download Thesis

Download a zipped postscript copy of my thesis, select the icon below. This version is single-spaced with smaller margins to conserve paper; the official University of Colorado thesis format requires double spaced print and larger margins.

94 pages.