A data grid is a computing infrastructure that provides intensive computation and analysis of shared data files and very large databases, from hundreds of terabytes to petabytes or more of data, across widely distributed user communities. The ability to do massive computations is very useful, but nearly all the power of the grid relies on data. Without data, there is little for all that computational power to do.
How Databases Enter the Picture
While database management systems have been the repository of business information for decades, academia has been somewhat less apt to store data in the same manner. Traditionally, the primary data source and data sink on the grid have been flat files. But databases are core to many thousands of organizations, both large and small, and these databases (often heterogeneous and spread across different operating systems) house the vast majority of the data that drives these organizations. Venturing into the larger worlds of business and science, the grid concept is shifting to access the vast stores of data that reside in worldwide databases, along with the information residing in flat files. This shift is one of the primary reasons that the grid is making forays into commercial and business applications.
Among the scientific uses of the grid are providing access to multiple types of data in multiple areas of interestweather, physics, astronomy, seismology, and so onto find trends, attempt predictions, and determine the nature of the given topic. The same structure, on a smaller scale, can be used to access the data housed in databases across an organization.
Ultimately, a grid may need to access many different databases on many different platforms, housing vastly different types of data. Financial data housed in an Oracle database, employee data in a SQL Server database, operational data in a DB2 database, e-business client information in a MySQL databaseall can be brought together centrally to answer computationally-intensive questions. The grid promises to help bring this data together simply and elegantlywithout having it be a programming feat, and without the end user having to know where the data is, how to access it, or how it's stored.