Usage Examples
Describing features is not sufficient to grasp the potential of this technology. This section covers a sample of projects that illustrate some interesting use of the technology.
Airline Analysis
This company needed to analyze data and quickly react to market demands. One of their processing tasks involved loading 30 million to 50 million records daily. The processing on these records is very complex and requires 500 megabytes of historical information. Using the plain relational database capabilities, they had to process all this in a client application. They had to abandon this approach because their ESQL/C program could process less than 10% of the data in 16 hours. The implementation in the server provided the complete processing in 2.5 hours, an 80 times performance improvement. For them, it was the difference between dreaming of doing the analysis and doing it.
What caused this jump in performance? At a high level, we can identify UDAs and UDFs and the implementation of new types that use a better internal representation of the business data. With that, they could also take advantage of the database framework for set processing and parallel processing. Finally, they did not have to move the data back and forth between the database and the application, saving the transfer time.
Travel Reservation
This project involved building a customer-centric travel management system for corporations. The environment was Java, CORBA, and IDS 9.x. There was also a need to interact with a mainframe system to verify some information.
The system analysis and design followed an object-oriented methodology. It turned out that it facilitated the implementation in the database because we could use table inheritance when it was deemed appropriate.
We needed to be able to locate hotels within a certain distance of a desired location. This was easily done, as longitude and latitude were already part of the information. To provide this functionality, we used the code discussed in Chapter 6 with the addition of a function that calculated the distance between two points.
The last major thing we did was implement a hierarchical policy system that could find the policies that apply to a specific hotel or flight origin or destination or find the resources that are within policies in a specific area. This was done with the Node type, discussed in Chapter 2.
Capital Market Solution
The challenge in this application was to provide near real-time access for stock ticker analysis. The company's previous attempt, using a competing relational database management system (RDBMS), achieved a total of 500 ticks per second. The problem was that they estimated that they needed 12,000 ticks per second!
The Informix solution came based on two commercially available DataBlades: TimeSeries and the NAG statistical DataBlade. Informix worked closely with the customer and created a new DataBlade: the real-time loader. Features include a tighter integration with the application receiving the ticks and a VTI interface that provides access to the values even before they are stored in the time series. This solution was able to achieve 25,000 ticks per second on the same platform used previously that achieved only 500 ticks per second. This represents a 50 times performance improvement.
Interestingly, many of the characteristics found in this packaging could be applied to other industries with likely performance benefits.
Fabric Classification
In this project, we wanted to store fabric characteristics in a database so people could search on these characteristics to find the subset of fabrics that interests them. As the number of fabrics increases, the indexing efficiency becomes more important. Two characteristics were very interesting: material and color.
The material search involved a hierarchical representation as discussed in Chapter 2. Instead of being able to search on the material itself, such as nylon and cotton, you could search for a category of fabric such as synthetic and natural. The hierarchy could extend multiple levels based on the desired classification. Other types of hierarchies were also required. The hierarchical search was indexed using a new type that facilitated the indexing and search of these hierarchies. This provided efficient indexing of this subproblem, a facility not available in standard relational databases.
The second problem involved the search on color. For people not familiar with these types of searches, the first thought that comes up is to have the color name available for indexing. In this industry, a color is a much more precise value. It consists of three numbers representing the color. Changing any of the three values slightly changes the color. This means that a color search becomes a search on a range for each of the three values. There were other requirements that made the search a range on five values. This was a perfect candidate for an R-tree index. As discussed before, the R-tree indexing discriminates enough on the values to be selected by the optimizer where multiple B-tree indexes would not. A modified version of the code discussed in Chapter 6 was used to provide the proper indexing.
These indexing schemes provided the basis for efficient and scalable searches for the project.
Area Management
This project was interesting because it involved replacing an existing solution, written in SPL, with a simpler one, still written in SPL. The original solution involved 83 lines of complex SPL with three SELECT statements in two embedded FOREACH statements. The problem with this approach is that it is complex and also specific to a single problem because the SQL statements had to be hard-coded in the SPL code. Any similar problem that requires either at least one different table or a slight variation in the selection of the rows would also require a new 83 lines of stored procedure.
An analysis of the code revealed that the entire process was in fact aggregating objects together. The problem became how to augment the database capabilities so it could process the information in an aggregation fashion: a UDA.
The new approach included 23 lines of SPL and could be applied to any SQL statement, making it a generic solution. The execution required only one SQL statement instead of looping multiple SQL statements in a stored procedure. The new solution was then simpler and more efficient.