SQL Server Reference Guide

Mar 28, 2003

␡

⎙ Print

< Back Page 284 of 365 Next >

It's rare that a single data store (such as an Oracle or SQL Server database) holds all of the data a particular application might need. In many cases a single system has data that could be correlated with another system, so users are left to collate the information themselves, such as bringing up two applications and possibly even copying and pasting into a spreadsheet or third data tool to combine the data they need.

Add to this that companies are using not only on-premise applications but adding in applications on the web, and users are left with even more disconnected data sources. As a data professional, from the Database Administrator to the Database Developer, you will be called on to stich these systems together to ensure that your users have access to what they need.

You may also be asked to work with a Hybrid Cloud Solution such as Windows and SQL Azure. To allow the power and flexibility of these solutions but migrate slowly or perhaps to keep private and secure data locally, you'll need a way to connect to two or more data sources at once within a single user interface. In this tutorial I'll show you a few options on how to do that.

Like everything else in computing, there is no single solution to a set of requirements. It's important when you think about any of these options that they fit the particular situation at hand. Evaluate the needs first, and then layer the solutions and find the one that fits best. It's not uncommon or incorrect to use any or all of a mix of these solutions, especially across the entire application set you have to work with. You'll find one solution works best for this application, a different one for another application.

As I explain each of these options, remember that within your application requirements, there are other vectors to think about when connecting to multiple data sources: security, performance, and data types to name the more important ones. Each system has different ways of handling these items, and you need to understand what those impacts are, especially in the security area. Many code break-ins are a result of a "weak link" of combining more than one data source.

There are two main ways to access data on multiple servers: using a Data Access Layer (DAL) or using Distributed Queries originated from SQL Server. In this tutorial I'll show you each method, which have their own strengths and weaknesses.

Whether you're a DBA or a Developer, read through both of these sections to understand how each works, and then you can work with your administrator or developer counterpart to come up with the best solution.

For this tutorial, I'll use the standard Purchase Order example to illustrate. Purchase orders have a "header" set of information that doesn't change very frequently, and a "details" area that holds all of the information for a particular order. These are usually in at least two sets of tables in a Relational Database Management System (RDMBS) and are tied together with a Primary Key. In this simplified example, I'll just use a company "number" as the key that ties these together.

Using A Data Access Layer or "Middle Tier"

This process is called a Data Tier or more commonly a Data Access Layer (DAL). The basic idea in any of these is that you replace the direct connection to a database from a client to a central program that accesses data for the client. The presentation client then accesses the DAL to return data back to the screen. It basically moves the query from the client to another program.

When you use this approach, normally you're converting a data structure to a programming Class. This is an Object-Oriented view of data. Classes are Objects that can hold properties (in many ways similar to a column in a database table) or perform actions such as set (write) and get (read).

To bring several datasets together into a single object, the general process is to set up a Class that holds the data structure — this is what is returned to the calling program. The class is made up of a connection to the various datasets, joined on a common key between them. So basically you set up something called "A," connect to "B" and "C," and put the data they hold in "A." The client program then works from "A" as if it were a single object. This is similar to what happens in a standard SQL Join, with the exception that it isn't done in the SQL language and the data sources can be virtually anything.

You might wonder why you wouldn't just solve every data integration problem this way — and some developers do. There are some issues with this approach, most notably Data Integrity, Performance, and Data Type mismatches.

Just because two datasets share a distinct value may not indicate that you can join them together. Using the simple example of the Purchase Order, it's not always true that just because a dataset that has a company number as a field in a table, that particular number is always the right one when you perform a programmatic join to another dataset, perhaps in an XML document, that has a company number in a Details line items set of data. So you can see that the first danger — and it is a big one — is that data might be joined without regard to ensuring that no "false connections" are made. This issue gets more problematic the more data sets you add in. Just as in electronics, just because you can plug something in doesn't mean you won't end up with a nasty shock.

Assuming that you have ensured the data sets maintain integrity between them, you also need to consider performance. You might have one of the most extremely tuned database systems around, but if that is joined in a program to a huge text file accessed over an Internet connection, the data is returned at the combination rate of both of them. There are ways to mitigate this issue, but it is something to consider.

Data type mismatches are a real issue with this method. It's almost certain that data will be returned from different systems using different data types, and even a small difference can be catastrophic. You can mitigate this issue by carefully examining how each data source handles a data type and casting or converting it in the code to a common, consistent format and numeric handling.

Although definitely not the last thing you should consider but the most important, security is a key consideration when accessing data from a DAL. The DAL should not allow more data access than is completely necessary for a given operation. Once again, understand the data security in each source, and ensure that your code does not allow access to more than absolutely necessary. Also, perform a full risk analysis on your code to ensure that it is secure from any attacks that might allow an unauthorized access to that authorized data.

Reading a combined dataset is relatively simple. However, you have to handle adding, editing or deleting data separately. Normally you capture the changes on the form the user edits, and then perform a set operation back through each dataset individually. In essence your code handles the transaction handling normally done by the RDBMS engine. That's not a problem, it just needs to be thought through carefully. There are constructs, especially within .NET Framework Libraries, to help you handle this.

With all of those caveats, you can access data from multiple sources in code quite simply. I'll stick with the .NET constructs in this tutorial as an example, although there are several examples here at InformIT and elsewhere for Java, C++ and other languages.

Using a .NET Data Adaptor DataSet

The first and most simple method is to use a .NET DataSet object. You set up two (or more) connections, query the data between them, set up a joining operation to combine them into a third object that acts as the aggregated single source. Here's a C# example of that process:

//Set up two connections, more are possible
SqlConnection companyConnection = new SqlConnection("Data Source=FirstServer;Integrated Security=SSPI;Initial Catalog=PurchaseOrders");
SqlConnection lineItemConnection = new SqlConnection("Data Source=SecondServer;Integrated Security=SSPI;Initial Catalog=PurchaseOrders");

SqlDataAdapter headerAdapter = new SqlDataAdapter("SELECT * FROM Company", companyConnection);
//Note: if this is to be a different source such as Oracle or text files, set up a different
//connection information above and possibly a different connection below
SqlDataAdapter lineitemAdapter = new SqlDataAdapter("SELECT * FROM LineItem", lineItemConnection);

// This is the combined dataset
DataSet companyLineItems = new DataSet();

// Go get data from each dataset from above
headerAdapter.Fill(companyLineItems, "companies");
lineitemAdapter.Fill(companyLineItems, "lineItems");


// Join them
DataRelation relation = companyLineItems.Relations.Add("CustLineItems",
  companyLineItems.Tables["companies"].Columns["companyID"],
  companyLineItems.Tables["lineItems"].Columns["companyID"]);

  // Display the combined data
foreach (DataRow pRow in companyLineItems.Tables["companies"].Rows)
{
  Console.WriteLine(pRow["companyID"]);
	foreach (DataRow cRow in pRow.GetChildRows(relation))
	Console.WriteLine("\t" + cRow["LineItem"]);
}

Using Language Integrated Query (LINQ)

Another method is to use Language Integrated Query (LINQ). LINQ is a single data query language used in .NET programming languages. It's completely integrated into the .NET framework, so developers use this construct quite often. I have an entire article on this technology here, so I won't repeat that information in this article.

Here's a snippet of an example of two text files in C# (the SQL Server examples are longer, to be more clear I'm just using text files here):

class PurchaseOrder
{
    public string CompanyName { get; set; }
    public string CompanyAddress { get; set; }
    public int ID { get; set; }
    public List<int> LineItems { get; set; }
}

class PopulateCollection
{
    static void Main()
    {
		 string[] POHeaders = System.IO.File.ReadAllLines(@"../../../POHeaders.csv");
         // File example, Bucksoft,1234 Mockingbird Lane,1001

        string[] LineItems = System.IO.File.ReadAllLines(@"../../../LineItems.csv");
        // File example, 1001, Product 234, Product 237, Product 341, Product 795

        // Merge the data sources - This is the main part of the work
        IEnumerable<PurchaseOrder> queryNamesLineItems =
            from nameLine in POHeaders
            let splitName = nameLine.Split(',')
            from itemLine in LineItems
            let splitLineItemLine = itemLine.Split(',')
            where splitName[2] == splitLineItemLine[0]
            select new PurchaseOrder()
            {
                CompanyName = splitName[0],
                CompanyAddress = splitName[1],
                ID = Convert.ToInt32(splitName[2]),
                LineItems = (from itemAsText in splitLineItemLine.Skip(1) // Do not show the key again
                              select Convert.ToInt32(itemAsText)).
                              ToList()
            };

        List<PurchaseOrder> companys = queryNamesLineItems.ToList();

        // Display each company's name and items
        foreach (var company in companys)
        {
            Console.WriteLine("The items are {0} {1} {2} {3} {4}", company.CompanyName, company.CompanyAddress, company.LineItems);
        }

        //Keep console window open in debug mode
        Console.WriteLine("Press any key to exit.");
        Console.ReadKey();
    }
}

This example is similar to one in Microsoft documentation on LINQ, and there is a full explanation of that code here: http://msdn.microsoft.com/en-us/library/bb513866.aspx

Using Entity Framework

Entity Framework (EF) is a new form of working from a model-first development mechanism, and I've described it further here. I won't show a full example of this kind of query, primarily because it requires not just code. You'll modify control files for the application to add the various providers to be able to call them in code. Because it's more a more complex configuration, and since I'm trying to focus more on a data professional audience than a developer one, I'll point you to this article which has more information on this process in specific, and to these articles for more general EF information. There's also an excellent book here to learn more as well.

Using Distributed Queries

Instead of (or even in combination with) using a Data Access Layer, you can have SQL Server connect to other datasets to bring data together in a single query or Stored Procedure, Function, or View. Microsoft calls this a Distributed Query, and you can learn more about how they view this process here: http://msdn.microsoft.com/en-us/library/ms188721.aspx

Using this method you're able to handle some of the performance and data type issues, but even so there are issues to consider. There's still the security angle, and the simple fact is that querying across multiple data sources takes more time than from one single source. It also gets more difficult to create dissimilar joins using the SQL Server when you add more and more sources into the mix.

Four Part Queries

The simplest way to access more than one data set is to qualify the name in four parts in the query:

server.database.schema.object

While it's simple, there are quite a few caveats. First, if you want to use the server name part, you'll have to set up a linked server — which I'll talk about in a moment. So I'll drop the server name for now —

database.schema.object

Now you can see that the other data source actually has to be another database on the same server, which implies of course that your other data source is on SQL Server, and most specifically that it is on the same Instance of SQL Server. Also, from the security standpoint you have to allow "Cross Database Ownership Chaining," a very dangerous thing to do.

With all of those caveats, I set my test instance of SQL Server to allow Cross Database Ownership Chaining and ran this code:

/* Super uber-brain-dead example - cross database ownership 
must be enabled which can be evil */
USE master;
GO

CREATE DATABASE TestPurchaseOrderDatabaseA;
GO
CREATE DATABASE TestPurchaseOrderDatabaseB;
GO

/* Set up the first database to hold the company data */
Use TestPurchaseOrderDatabaseA;
GO
CREATE TABLE PurchaseOrderHeader 
(CompanyID int PRIMARY KEY
,CompanyName Varchar(50)); 
GO

INSERT INTO PurchaseOrderHeader 
VALUES (1, 'Awesome Examples RUs');
GO
INSERT INTO PurchaseOrderHeader 
VALUES (2, 'Security Is Overrated, Inc.');
GO


/* Set up the second database to hold the details data */
Use TestPurchaseOrderDatabaseB;
GO
CREATE TABLE PurchaseOrderDetail 
(LineID int PRIMARY KEY
, CompanyID int
,LineItem Varchar(50)); 
GO
INSERT INTO PurchaseOrderDetail
VALUES (1,1, 'Distributed Query Example 1');
GO
INSERT INTO PurchaseOrderDetail
VALUES (2,1, 'Distributed Query Example 2');
GO
INSERT INTO PurchaseOrderDetail
VALUES (3,2, 'Let Me In');
GO
INSERT INTO PurchaseOrderDetail
VALUES (4,2, 'My Password is 1234');
GO

/* Now put it all together */
SELECT A.CompanyName, B.LineItem
FROM TestPurchaseOrderDatabaseA.dbo.PurchaseOrderHeader as A
INNER JOIN TestPurchaseOrderDatabaseB.dbo.PurchaseOrderDetail as B
ON A.CompanyID = B.CompanyID
 
/* Clean Up, Clean Up, everybody do your share */
Use master;
GO
DROP DATABASE TestPurchaseOrderDatabaseA;
GO
DROP DATABASE TestPurchaseOrderDatabaseB;
GO

Once again, only under rare circumstances is this a good idea.

OPENROWSET

If you need to access data from a different database or even a different server and server type than SQL Server, you can use the OPENROWSET call in a SQL Server query. You will need to supply the type of connection you want to make, send the query, and once again join the results. Here's an example where I'll use the same table layout, but this time I'll put the details in a text file. The instructions if you want to try this are in the comments.

/* Use OPENROWSET to connect a text file to a database.

First, you'll need a text file called LineItems.txt in c:\temp 
that has this data in it:

1	Distributed Query Example 1
1	Distributed Query Example 2
2	Let Me In
2	My Password is 1234

Now create another text file in c:\temp called LineItems.fmt 
that has this data:
9.0
2
1  SQLCHAR  0  10 "\t"        1  ID                SQL_Latin1_General_Cp437_BIN
2  SQLCHAR  0  40 "\r\n"      2  Description        SQL_Latin1_General_Cp437_BIN
*/

USE master;
GO
CREATE DATABASE TestPurchaseOrderDatabase;
GO

/* Set up the database to hold the company data */
Use TestPurchaseOrderDatabase;
GO
CREATE TABLE PurchaseOrderHeader 
(CompanyID int PRIMARY KEY
,CompanyName Varchar(50)); 
GO

INSERT INTO PurchaseOrderHeader 
VALUES (1, 'Awesome Examples RUs');
GO
INSERT INTO PurchaseOrderHeader 
VALUES (2, 'Security Is Overrated, Inc.');
GO

/* Now put it together */
SELECT A.CompanyName, B.Description
FROM PurchaseOrderHeader as A
INNER JOIN OPENROWSET( BULK 'c:\temp\LineItems.txt', 
   FORMATFILE = 'c:\temp\LineItems.fmt') AS B
ON A.CompanyID = B.ID;
GO

This method works well for well-defined OLE DB data sources, and there are quite a few of those. You can learn more about your options here: http://msdn.microsoft.com/en-gb/library/ms190312.aspx

But a simple examination of the Execution Plan for this query shows that it's quite a bit of time to "spin up" that remote query provider to access the data. While it works, it can be a bit slow, especially if you use it over and over. For that, you can use a Linked Server.

Linked Servers and OPENQUERY

A "Linked Server" is essentially a definition of another data source within SQL Server. The engine opens a channel to the other data source and allows you to query it.

To create a Linked Server, you can use graphical tools or the sp_addlinkedserver Stored Procedure. Once you connect to that server you can implement the four-part query I mentioned earlier. You can reference this link to locate the exact parameters you need, based on the type of data source you want to connect to. For instance, here is an example of connecting to an Oracle server called by the SQL*Net name of PurchasingServer, and setting that name to TestPurchaseOrderDatabaseB in SQL Server:

EXEC sp_addlinkedserver
   @server = N' TestPurchaseOrderDatabaseB',
   @srvproduct = N'Oracle',
   @provider = N'MSDAORA',
   @datasrc = N'PurchasingServer';
GO

Now you can reference the server in a SQL Server by the four part name like this:

SELECT * FROM
TestPurchaseOrderDatabaseB.dbo.PurchaseOrderDetails;
GO

And use it just as I showed earlier for the three-part name. While Linked Servers simplify connections, they can be problematic. Connection parameters (like having the proper Oracle Client installed on the server) and so on can grow quickly. Also, performance can suffer because of the methods that the engine chooses to query the other server.

To help with the second issue, you can create a "pass-through" query. This construct sends the query directly to the other data system, which will theoretically process it faster than SQL Server will, and return a data set quicker. To create a pass-through query after you create your Linked Server you can use the OPENQUERY command.

Essentially you treat this command as a sub-SELECT operation, like this:

SELECT *
FROM OPENQUERY(TestPrucahseOrderDatabaseB, 'SELECT ID, Description  FROM PruchaseOrderGroup.LineItems') ;
GO

This can be an interesting exercise when you want to edit or delete the data — read the full description of that process here: http://msdn.microsoft.com/en-us/library/ms188427.aspx

While Linked Servers are very simple to implement, I find them quite "brittle," meaning that they malfunction often. Also, it is quite frustrating to learn that the query you've been battling with for a while is on a Linked Server — look for those four-part-names if you see a misbehaving query.

Security can also be an issue. Once the server is linked, it carries the security of the code, not necessarily the user.

OPENDATASOURCE

If you do need the power and convenience of a Linked Server, it's often a better choice to use the command. This command allows you to use a four-part-name in a query without having to create a Linked Server first. It's most useful with another SQL Server, because you can send along the Windows Authentication that the SQL Server Service is using. If you use another provider, like Oracle, you need to protect this code because you may have to send along your name and password — something you don't want in visible code. I have also used this with "security-less" systems like Excel spreadsheets, like this:

SELECT * FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0'
,'Data Source=C:\temp\POLineItems.xls;Extended Properties=EXCEL  5.0')...[Sheet1$] ;

As you can see, you have quite a few options for connecting to multiple sources of data. My personal take is that a programmatic method is often the most secure and best performing option, but that assumes that the developers have done their homework on making that true. Each situation is different, and knowing more about what options you have is the key.

< Back Page 284 of 365 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address