- Natural or Surrogate Keys
- A New Database
- Which Datatype Should Be Used for Surrogate Keys?
- Pros and Cons of GUIDs as Primary Keys
- The Tests
- The Results
- A Few Tweaks
- Other Approaches
- Conclusion
Which Datatype Should Be Used for Surrogate Keys?
I'm going to focus on two different datatypes for surrogate keys, INTEGER and GUID. Let's start with INTEGER.
NOTE
In the specific example shown in Figure 1, the users need a column in the ordermaster table to identify each order uniquely. The most correct solution to this is add an orderno column, which is given a value separately from the ID column. In the case of INTEGER+IDENTITY for the primary key, it would be fairly common to skip the orderno column, but I prefer not to let the user know the real key value.
To avoid affecting the test too much, I decided to skip the user-known column here. Otherwise, I would have had to maintain the orderno value manually for when the IDENTITY has already been used for the ID. When GUID is used for the ID, I could have let IDENTITY handle the orderno column. I did not do that, though, to avoid testing too many apples and oranges (or apples and pears, as we say in Sweden).
INTEGER as the Datatype
Even though we have decided to use surrogate keys, we are not yet finishedwe still have to choose a specific datatype. The most typical one to use is INTEGER. An INTEGER is 4 bytes. You can declare INTEGER columns as IDENTITY and thereby let the database provide new row values automatically. Listing 1 illustrates an example of a new row being INSERTed into the ordermaster table and the new ID being fetched for later use when INSERTing into the orderdetail table.
Listing 1: Stored Procedure for Inserting a Row into ordermaster, INT Version
CREATE PROCEDURE a_OrderINT_InsertMaster (@customerId INT, @orderId INT OUTPUT) AS DECLARE @anError INT INSERT INTO orderINTmaster (customer_id, orderdate, status) VALUES (@customerId, GETDATE(), 1) SET @anError = @@ERROR SET @orderId = @@IDENTITY RETURN @anError
Before we continue, I'd like to say a couple of things about the code in Listing 1 and the other code in this article:
The error handling is not of production quality, but I decided to do it this way to avoid distracting you from the topic of the article and to get as clear test results as possible. You will find more information about appropriate error handling in my book, .NET Enterprise Design with Visual Basic .NET and SQL Server 2000.
I am very fond of using user-defined data types, but I skipped them in the code snippet here, to make the code as obvious and clear as possible.
In real code, you obviously must use a lot of comments! For example, I always start all stored procedures with a comment block, in which I state the purpose, tell who and when, and also document.
You can download all the code for this article at http://www.jnsk.se/informit/pkguid/.
NOTE
You can let the application server decide what the next value should be, but then you would have to be careful when INTEGER is used as the datatype for the primary keys that you don't get too much chatter between the application server and the database server. Other risks are creating a hot spot, which reduces concurrency, and creating a solution that makes it impossible to clone the application server.
A rather long and interesting discussion addresses different solutions for "manually" creating surrogate INTEGERs, but that is beyond the scope of this article.
UNIQUEIDENTIFIER as the Datatype
A GUID is called UNIQUEIDENTIFIER in SQL Server. It has been supported as a native datatype in SQL Server since version 7. A GUID is 16 bytes long (four times the size of an INTEGER).
In Listing 2, you can see an example of a new row being INSERTed into the ordermaster table when GUIDs are used for the keys. In this case, the ID is generated before the INSERT statement, with the help of NEWID().
NOTE
For reasons of simplicity, there are two sets of tables and stored procedures: one set with "INT" in the name and one set with "GUID" in the name.
Listing 2: Stored Procedure for Inserting a Row into ordermaster, GUID Version
CREATE PROCEDURE a_OrderGUID_InsertMaster (@customerId UNIQUEIDENTIFIER , @orderId UNIQUEIDENTIFIER OUTPUT) AS DECLARE @anError INT SET @orderId = NEWID() INSERT INTO orderGUIDmaster (id, customer_id , orderdate, status) VALUES (@orderId, @customerId , GETDATE(), 1) SET @anError = @@ERROR RETURN @anError