- The Project
- Project Challenges
- Resolving the Challenges
- Optimizing DTS Performance
- Summary
Resolving the Challenges
The challenge of the unknown performance implications is difficult to overcome. The best and the only way to find out is to test. Much of the reference literature may indicate which Tasks are slower and which are faster, but it varies depending on the Task and its alternative. For example, almost every reference tells you that data manipulations slow down a transformation tremendously. I had one Task that imported the data, and used a stored procedure to verify and manipulate some data. While testing, I added the data manipulations in the transfer instead of calling the stored procedure, and the total transfer time was cut down by about five times! I hadn't even thought of manipulating this data because it was fairly intensive, and I thought for sure that the stored procedure would be much faster. The Active X script Task is the slowest Task in DTS, so for that reason alone I try to avoid its use (if possible). Fortunately, when writing Packages through the Object Model, I've found that I never need one because I can do what I need in the VB code.
Coding a DTS solution in Visual Basic has become significantly easier in SQL Server 2000 than it used to be with SQL Server 7. The difference is that you can now save your Packages as VB modules after creating them through the Package designer GUI. This way, if you're not sure about a particular Task, you can use a "cheat sheet" by building the Package in the DTS designer and then save the Package as a VB file. To do this, you click Package, click Save As, choose Visual Basic File for the location, name the Package file, and then click OK. Not only can you use the Package designer as your best friend when writing code, you can also discover a few "Easter eggs." A good example of this is the useDSL property, which is not documented in any reference literature I've found. If you save your Package as a VB file, a connection has this property, but it does not show up in the intellisense, and it's not in the Object Model reference. I've asked Microsoft about thisthey told me that it's a Data Source Link file to hold the name of a data source file if you use one, but the property takes a Boolean value, so I have yet to find out what this property is for.
In fact, if you don't have the luxury of working with SQL Server 2000, you can still script your Packagesbut the utility for doing so is on the SQL Server 7.0 CD and hidden very well. The utility doesn't work as well as it does in SQL Server 2000, but it gives you a file to use for a reference.
NOTE
If you want to learn more about DTS scripting utility available with SQL Server version 7.0 check out the following link:
The DTS capabilities and the Object Model have been extended quite a bit in SQL Server 2000. SQL Server 2000 has added some additional Tasks not available with SQL Server 7: an FTP Task to let you pull files from an FTP site (this is one-wayyou cannot post files to an FTP site, only get them), a Dynamic Properties Task that allows you to change different properties within the Package at runtime, several new transfer Tasks that allow you to transfer jobs, stored procedures, entire databases, error messages, and a Task to allow you to execute another DTS Package. Plus, many of the Tasks that were in SQL 7.0 have been enhanced in SQL 2000.
My business rules stated that any one column could contain up to 4000 bytes of data, but I obviously could have no more than one column that contained this much data if many of the other fields had data. The only file format that I could potentially have had a problem with were text files, because there are no limits on the size of a column or row, the other file formats I accepted were more restrictive than that of SQL Server. To overcome this, I checked the length of data to ensure that it would fit into a SQL row. If a row was too long, I would report this to the user, so they had a chance to modify the file and then reimport. Because the number of columns and the column names (the column names that the user would see) were not known until a file was loaded, this gave us another challenge of how to build stored procedures to access this data. I implemented a meta data table for the main table of data to track information about a user's imported data. This made many of the stored procedures re-sort to dynamic SQL execution, but the speed was still well within acceptable limits.
I initially thought that no list would have more than about 10,000 rows, so I used 15,000 when testing, just to be safe. This "maximum" number eventually grew to 100,000, so I adjusted my testing and had to go back to tweak any bottlenecks that we found. Some bottlenecks, such as running files through the third-party software, were beyond my control, so in some places, I ran them in a background process, and gave the user other options while the file was being processed. Most of the time, the file was finished by the time the user got to a point to use the data, so they never noticed that the system was in the background chuggin' away at their data.