Types of Data Storage
If you always generated, used, and stored data in a standard format, you wouldn't need to do much data transformation. But as data is used for different purposes, it is stored in different formats. A data transformation tool has to be able to take data in one format and move it to another.
This section outlines the support provided by DTS for transforming data in some of the primary types of storage. More technical information about creating DTS connections for various types of data storage systems can be found in Chapter 5, "DTS Connections."
Text Files
Text files remain one of the primary ways for storing data. They are often used to transfer data from one system to another.
DTS provides access to text files through an OLE DB provider. You can import data from or export data to text files. You can also use the Bulk Insert task with text files, but only to transfer data from a text file into SQL Server.
Listing 3.1 shows a standard text file of clickstream data from a web server.
Listing 3.1 Clickstream Data in a Text File
cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:52 -0600] "GET /whatis.htm HTTP/1.0" 200 13589 cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:53 -0600] "GET /image4.gif HTTP/1.0" 200 6276 cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:54 -0600] "GET /navibar.gif HTTP/1.0" 200 1031 cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:54 -0600] "GET /image6.gif HTTP/1.0" 200 991 cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:54 -0600] "GET /home.gif HTTP/1.0" 200 1272 cacheflow19.isu.net.sa - - [01/Jan/2000:02:04:54 -0600] "GET /image5.gif HTTP/1.0" 200 1552
XML
XML is an Internet-ready form of text file data storage. It provides a powerful format for sharing data between different applications.
SQL Server 2000 provides XML support in a number of ways:
-
You can export relational data to XML by using the FOR XML clause in a SELECT statement.
-
You can import data into a relational database by using the OpenXML rowset provider.
-
You can access SQL Server through a URL using XML.
-
You can use XML-Data schemas and Xpath queries.
-
You can set XML documents as command text and return result sets as a stream.
Listing 3.2 shows the output of the following SELECT query:
select top 8 o.OrderID, o.CustomerID, od.ProductID, od.Quantity from orders o join [order details] od on o.orderid = od.orderid order by o.OrderID
Listing 3.2 Result Set from Querying the Orders and Order Details Tables
OrderID CustomerID ProductID Quantity ----------- ---------- ----------- -------- 10248 VINET 11 12 10248 VINET 42 10 10248 VINET 72 5 10249 TOMSP 14 9 10249 TOMSP 51 40 10250 HANAR 41 10 10250 HANAR 51 35 10250 HANAR 65 15
Listing 3.3 shows the same query with the FOR XML AUTO clause:
select top 8 o.OrderID, o.CustomerID, od.ProductID, od.Quantity from orders o join [order details] od on o.orderid = od.orderid order by o.OrderID for xml auto
Listing 3.3 The Same Result Set Returned with FOR XML AUTO
XML_F52E2B61-18A1-11d1-B105-00805F49916B <o OrderID="10248" CustomerID="VINET"> <od ProductID="11" Quantity="12"/> <od ProductID="42" Quantity="10"/> <od ProductID="72" Quantity="5"/> </o> <o OrderID="10249" CustomerID="TOMSP"> <od ProductID="14" Quantity="9"/> <od ProductID="51" Quantity="40"/> </o> <o OrderID="10250" CustomerID="HANAR"> <od ProductID="41" Quantity="10"/> <od ProductID="51" Quantity="35"/> <od ProductID="65" Quantity="15"/> </o>
The Parallel Data Pump task (described in Chapter 10) is a useful tool for handling XML data because it provides support for hierarchical recordsets. The Transform Data task (see Chapter 6) has transformations designed to load text into a database, which can be used for loading XML files. There are third-party companies that are offering OLE DB providers that read XML data so that it can be used as the source for a transformation task.
Spreadsheets
Many companies store some of their data, and especially their business analysis data, in spreadsheets. DTS uses the Jet OLE DB provider to connect to Microsoft Excel spreadsheets. DTS can import and export data to Excel.
Relational Database Management Systems
A significant portion of corporate data is stored in Relational Database Management Systems (RDBMSs). These systems provide powerful tools for querying and updating the data, using SQL.
DTS has an OLE DB provider for Microsoft SQL Server, Microsoft Access, and Oracle. The OLE DB provider for ODBC can be used for other database systems that have ODBC drivers available, but not OLE DB providers.
Normalized Database Schema
A normalized database schema is the most efficient format for storing OLTP data. In a normalized database, each piece of information is only stored once, so it can be updated in a single location. It is the more efficient schema for accessing individual records.
Figure 3.8 shows a typical normalized database schema.
Multidimensional (Star) Database Schema
A multidimensional, or star, schema is the most efficient format for storing data used for Business Analysis. In a star schema, the facts being analyzed are stored in a central fact table. This table is surrounded by dimension tables, which contain all the perspectives by which the facts are being analyzed.
Figure 3.9 shows a typical star database schema.
Chapter 4, "Using DTS to Move Data into a Data Mart," discusses the use of the star schema.
Multidimensional Database Management Systems (OLAP)
You can create a multidimensional database schema in a relational database system. There are also database systems that are specifically designed to hold multidimensional data. These systems are typically called OLAP servers. Microsoft Analysis Server is an example of an OLAP server.
The primary unit of data storage in a relational database system is a two-dimensional table. In an OLAP system, the primary unit of storage is a multidimensional cube. Each cell of a cube holds the data for the intersection of a particular value for each of the cube's dimensions.
The actual data storage for an OLAP system can be in a relational database system. Microsoft Analysis Services gives three data storage options:
-
MOLAPMultidimensional OLAP. Data and calculated aggregations stored in a multidimensional format.
-
ROLAPRelational OLAP. Data and calculated aggregations stored in a relational database.
-
HOLAPHybrid OLAP. Data stored in a relational database and calculated aggregations stored in multidimensional format.