- SQL Server Reference Guide
- Introduction
- SQL Server Reference Guide Overview
- Table of Contents
- Microsoft SQL Server Defined
- SQL Server Editions
- SQL Server Access
- Informit Articles and Sample Chapters
- Online Resources
- Microsoft SQL Server Features
- SQL Server Books Online
- Clustering Services
- Data Transformation Services (DTS) Overview
- Replication Services
- Database Mirroring
- Natural Language Processing (NLP)
- Analysis Services
- Microsot SQL Server Reporting Services
- XML Overview
- Notification Services for the DBA
- Full-Text Search
- SQL Server 2005 - Service Broker
- Using SQL Server as a Web Service
- SQL Server Encryption Options Overview
- SQL Server 2008 Overview
- SQL Server 2008 R2 Overview
- SQL Azure
- The Utility Control Point and Data Application Component, Part 1
- The Utility Control Point and Data Application Component, Part 2
- Microsoft SQL Server Administration
- The DBA Survival Guide: The 10 Minute SQL Server Overview
- Preparing (or Tuning) a Windows System for SQL Server, Part 1
- Preparing (or Tuning) a Windows System for SQL Server, Part 2
- Installing SQL Server
- Upgrading SQL Server
- SQL Server 2000 Management Tools
- SQL Server 2005 Management Tools
- SQL Server 2008 Management Tools
- SQL Azure Tools
- Automating Tasks with SQL Server Agent
- Run Operating System Commands in SQL Agent using PowerShell
- Automating Tasks Without SQL Server Agent
- Storage – SQL Server I/O
- Service Packs, Hotfixes and Cumulative Upgrades
- Tracking SQL Server Information with Error and Event Logs
- Change Management
- SQL Server Metadata, Part One
- SQL Server Meta-Data, Part Two
- Monitoring - SQL Server 2005 Dynamic Views and Functions
- Monitoring - Performance Monitor
- Unattended Performance Monitoring for SQL Server
- Monitoring - User-Defined Performance Counters
- Monitoring: SQL Server Activity Monitor
- SQL Server Instances
- DBCC Commands
- SQL Server and Mail
- Database Maintenance Checklist
- The Maintenance Wizard: SQL Server 2000 and Earlier
- The Maintenance Wizard: SQL Server 2005 (SP2) and Later
- The Web Assistant Wizard
- Creating Web Pages from SQL Server
- SQL Server Security
- Securing the SQL Server Platform, Part 1
- Securing the SQL Server Platform, Part 2
- SQL Server Security: Users and other Principals
- SQL Server Security – Roles
- SQL Server Security: Objects (Securables)
- Security: Using the Command Line
- SQL Server Security - Encrypting Connections
- SQL Server Security: Encrypting Data
- SQL Server Security Audit
- High Availability - SQL Server Clustering
- SQL Server Configuration, Part 1
- SQL Server Configuration, Part 2
- Database Configuration Options
- 32- vs 64-bit Computing for SQL Server
- SQL Server and Memory
- Performance Tuning: Introduction to Indexes
- Statistical Indexes
- Backup and Recovery
- Backup and Recovery Examples, Part One
- Backup and Recovery Examples, Part Two: Transferring Databases to Another System (Even Without Backups)
- SQL Profiler - Reverse Engineering An Application
- SQL Trace
- SQL Server Alerts
- Files and Filegroups
- Partitioning
- Full-Text Indexes
- Read-Only Data
- SQL Server Locks
- Monitoring Locking and Deadlocking
- Controlling Locks in SQL Server
- SQL Server Policy-Based Management, Part One
- SQL Server Policy-Based Management, Part Two
- SQL Server Policy-Based Management, Part Three
- Microsoft SQL Server Programming
- An Outline for Development
- Database
- Database Services
- Database Objects: Databases
- Database Objects: Tables
- Database Objects: Table Relationships
- Database Objects: Keys
- Database Objects: Constraints
- Database Objects: Data Types
- Database Objects: Views
- Database Objects: Stored Procedures
- Database Objects: Indexes
- Database Objects: User Defined Functions
- Database Objects: Triggers
- Database Design: Requirements, Entities, and Attributes
- Business Process Model Notation (BPMN) and the Data Professional
- Business Questions for Database Design, Part One
- Business Questions for Database Design, Part Two
- Database Design: Finalizing Requirements and Defining Relationships
- Database Design: Creating an Entity Relationship Diagram
- Database Design: The Logical ERD
- Database Design: Adjusting The Model
- Database Design: Normalizing the Model
- Creating The Physical Model
- Database Design: Changing Attributes to Columns
- Database Design: Creating The Physical Database
- Database Design Example: Curriculum Vitae
- NULLs
- The SQL Server Sample Databases
- The SQL Server Sample Databases: pubs
- The SQL Server Sample Databases: NorthWind
- The SQL Server Sample Databases: AdventureWorks
- The SQL Server Sample Databases: Adventureworks Derivatives
- UniversalDB: The Demo and Testing Database, Part 1
- UniversalDB: The Demo and Testing Database, Part 2
- UniversalDB: The Demo and Testing Database, Part 3
- UniversalDB: The Demo and Testing Database, Part 4
- Getting Started with Transact-SQL
- Transact-SQL: Data Definition Language (DDL) Basics
- Transact-SQL: Limiting Results
- Transact-SQL: More Operators
- Transact-SQL: Ordering and Aggregating Data
- Transact-SQL: Subqueries
- Transact-SQL: Joins
- Transact-SQL: Complex Joins - Building a View with Multiple JOINs
- Transact-SQL: Inserts, Updates, and Deletes
- An Introduction to the CLR in SQL Server 2005
- Design Elements Part 1: Programming Flow Overview, Code Format and Commenting your Code
- Design Elements Part 2: Controlling SQL's Scope
- Design Elements Part 3: Error Handling
- Design Elements Part 4: Variables
- Design Elements Part 5: Where Does The Code Live?
- Design Elements Part 6: Math Operators and Functions
- Design Elements Part 7: Statistical Functions
- Design Elements Part 8: Summarization Statistical Algorithms
- Design Elements Part 9:Representing Data with Statistical Algorithms
- Design Elements Part 10: Interpreting the Data—Regression
- Design Elements Part 11: String Manipulation
- Design Elements Part 12: Loops
- Design Elements Part 13: Recursion
- Design Elements Part 14: Arrays
- Design Elements Part 15: Event-Driven Programming Vs. Scheduled Processes
- Design Elements Part 16: Event-Driven Programming
- Design Elements Part 17: Program Flow
- Forming Queries Part 1: Design
- Forming Queries Part 2: Query Basics
- Forming Queries Part 3: Query Optimization
- Forming Queries Part 4: SET Options
- Forming Queries Part 5: Table Optimization Hints
- Using SQL Server Templates
- Transact-SQL Unit Testing
- Index Tuning Wizard
- Unicode and SQL Server
- SQL Server Development Tools
- The SQL Server Transact-SQL Debugger
- The Transact-SQL Debugger, Part 2
- Basic Troubleshooting for Transact-SQL Code
- An Introduction to Spatial Data in SQL Server 2008
- Performance Tuning
- Performance Tuning SQL Server: Tools and Processes
- Performance Tuning SQL Server: Tools Overview
- Creating a Performance Tuning Audit - Defining Components
- Creating a Performance Tuning Audit - Evaluation Part One
- Creating a Performance Tuning Audit - Evaluation Part Two
- Creating a Performance Tuning Audit - Interpretation
- Creating a Performance Tuning Audit - Developing an Action Plan
- Understanding SQL Server Query Plans
- Performance Tuning: Implementing Indexes
- Performance Monitoring Tools: Windows 2008 (and Higher) Server Utilities, Part 1
- Performance Monitoring Tools: Windows 2008 (and Higher) Server Utilities, Part 2
- Performance Monitoring Tools: Windows System Monitor
- Performance Monitoring Tools: Logging with System Monitor
- Performance Monitoring Tools: User Defined Counters
- General Transact-SQL (T-SQL) Performance Tuning, Part 1
- General Transact-SQL (T-SQL) Performance Tuning, Part 2
- General Transact-SQL (T-SQL) Performance Tuning, Part 3
- Performance Monitoring Tools: An Introduction to SQL Profiler
- Performance Tuning: Introduction to Indexes
- Performance Monitoring Tools: SQL Server 2000 Index Tuning Wizard
- Performance Monitoring Tools: SQL Server 2005 Database Tuning Advisor
- Performance Monitoring Tools: SQL Server Management Studio Reports
- Performance Monitoring Tools: SQL Server 2008 Activity Monitor
- The SQL Server 2008 Management Data Warehouse and Data Collector
- Performance Monitoring Tools: Evaluating Wait States with PowerShell and Excel
- Practical Applications
- Choosing the Back End
- The DBA's Toolbox, Part 1
- The DBA's Toolbox, Part 2
- Scripting Solutions for SQL Server
- Building a SQL Server Lab
- Using Graphics Files with SQL Server
- Enterprise Resource Planning
- Customer Relationship Management (CRM)
- Building a Reporting Data Server
- Building a Database Documenter, Part 1
- Building a Database Documenter, Part 2
- Data Management Objects
- Data Management Objects: The Server Object
- Data Management Objects: Server Object Methods
- Data Management Objects: Collections and the Database Object
- Data Management Objects: Database Information
- Data Management Objects: Database Control
- Data Management Objects: Database Maintenance
- Data Management Objects: Logging the Process
- Data Management Objects: Running SQL Statements
- Data Management Objects: Multiple Row Returns
- Data Management Objects: Other Database Objects
- Data Management Objects: Security
- Data Management Objects: Scripting
- Powershell and SQL Server - Overview
- PowerShell and SQL Server - Objects and Providers
- Powershell and SQL Server - A Script Framework
- Powershell and SQL Server - Logging the Process
- Powershell and SQL Server - Reading a Control File
- Powershell and SQL Server - SQL Server Access
- Powershell and SQL Server - Web Pages from a SQL Query
- Powershell and SQL Server - Scrubbing the Event Logs
- SQL Server 2008 PowerShell Provider
- SQL Server I/O: Importing and Exporting Data
- SQL Server I/O: XML in Database Terms
- SQL Server I/O: Creating XML Output
- SQL Server I/O: Reading XML Documents
- SQL Server I/O: Using XML Control Mechanisms
- SQL Server I/O: Creating Hierarchies
- SQL Server I/O: Using HTTP with SQL Server XML
- SQL Server I/O: Using HTTP with SQL Server XML Templates
- SQL Server I/O: Remote Queries
- SQL Server I/O: Working with Text Files
- Using Microsoft SQL Server on Handheld Devices
- Front-Ends 101: Microsoft Access
- Comparing Two SQL Server Databases
- English Query - Part 1
- English Query - Part 2
- English Query - Part 3
- English Query - Part 4
- English Query - Part 5
- RSS Feeds from SQL Server
- Using SQL Server Agent to Monitor Backups
- Reporting Services - Creating a Maintenance Report
- SQL Server Chargeback Strategies, Part 1
- SQL Server Chargeback Strategies, Part 2
- SQL Server Replication Example
- Creating a Master Agent and Alert Server
- The SQL Server Central Management System: Definition
- The SQL Server Central Management System: Base Tables
- The SQL Server Central Management System: Execution of Server Information (Part 1)
- The SQL Server Central Management System: Execution of Server Information (Part 2)
- The SQL Server Central Management System: Collecting Performance Metrics
- The SQL Server Central Management System: Centralizing Agent Jobs, Events and Scripts
- The SQL Server Central Management System: Reporting the Data and Project Summary
- Time Tracking for SQL Server Operations
- Migrating Departmental Data Stores to SQL Server
- Migrating Departmental Data Stores to SQL Server: Model the System
- Migrating Departmental Data Stores to SQL Server: Model the System, Continued
- Migrating Departmental Data Stores to SQL Server: Decide on the Destination
- Migrating Departmental Data Stores to SQL Server: Design the ETL
- Migrating Departmental Data Stores to SQL Server: Design the ETL, Continued
- Migrating Departmental Data Stores to SQL Server: Attach the Front End, Test, and Monitor
- Tracking SQL Server Timed Events, Part 1
- Tracking SQL Server Timed Events, Part 2
- Patterns and Practices for the Data Professional
- Managing Vendor Databases
- Consolidation Options
- Connecting to a SQL Azure Database from Microsoft Access
- SharePoint 2007 and SQL Server, Part One
- SharePoint 2007 and SQL Server, Part Two
- SharePoint 2007 and SQL Server, Part Three
- Querying Multiple Data Sources from a Single Location (Distributed Queries)
- Importing and Exporting Data for SQL Azure
- Working on Distributed Teams
- Professional Development
- Becoming a DBA
- Certification
- DBA Levels
- Becoming a Data Professional
- SQL Server Professional Development Plan, Part 1
- SQL Server Professional Development Plan, Part 2
- SQL Server Professional Development Plan, Part 3
- Evaluating Technical Options
- System Sizing
- Creating a Disaster Recovery Plan
- Anatomy of a Disaster (Response Plan)
- Database Troubleshooting
- Conducting an Effective Code Review
- Developing an Exit Strategy
- Data Retention Strategy
- Keeping Your DBA/Developer Job in Troubled Times
- The SQL Server Runbook
- Creating and Maintaining a SQL Server Configuration History, Part 1
- Creating and Maintaining a SQL Server Configuration History, Part 2
- Creating an Application Profile, Part 1
- Creating an Application Profile, Part 2
- How to Attend a Technical Conference
- Tips for Maximizing Your IT Budget This Year
- The Importance of Blue-Sky Planning
- Application Architecture Assessments
- Transact-SQL Code Reviews, Part One
- Transact-SQL Code Reviews, Part Two
- Cloud Computing (Distributed Computing) Paradigms
- NoSQL for the SQL Server Professional, Part One
- NoSQL for the SQL Server Professional, Part Two
- Object-Role Modeling (ORM) for the Database Professional
- Business Intelligence
- BI Explained
- Developing a Data Dictionary
- BI Security
- Gathering BI Requirements
- Source System Extracts and Transforms
- ETL Mechanisms
- Business Intelligence Landscapes
- Business Intelligence Layouts and the Build or Buy Decision
- A Single Version of the Truth
- The Operational Data Store (ODS)
- Data Marts – Combining and Transforming Data
- Designing Data Elements
- The Enterprise Data Warehouse — Aggregations and the Star Schema
- On-Line Analytical Processing (OLAP)
- Data Mining
- Key Performance Indicators
- BI Presentation - Client Tools
- BI Presentation - Portals
- Implementing ETL - Introduction to SQL Server 2005 Integration Services
- Building a Business Intelligence Solution, Part 1
- Building a Business Intelligence Solution, Part 2
- Building a Business Intelligence Solution, Part 3
- Tips and Troubleshooting
- SQL Server and Microsoft Excel Integration
- Tips for the SQL Server Tools: SQL Server 2000
- Tips for the SQL Server Tools – SQL Server 2005
- Transaction Log Troubles
- SQL Server Connection Problems
- Orphaned Database Users
- Additional Resources
- Tools and Downloads
- Utilities (Free)
- Tool Review (Free): DBDesignerFork
- Aqua Data Studio
- Microsoft SQL Server Best Practices Analyzer
- Utilities (Cost)
- Quest Software's TOAD for SQL Server
- Quest Software's Spotlight on SQL Server
- SQL Server on Microsoft's Virtual PC
- Red Gate SQL Bundle
- Microsoft's Visio for Database Folks
- Quest Capacity Manager
- SQL Server Help
- Visual Studio Team Edition for Database Professionals
- Microsoft Assessment and Planning Solution Accelerator
- Aggregating Server Data from the MAPS Tool
It's rare that a single data store (such as an Oracle or SQL Server database) holds all of the data a particular application might need. In many cases a single system has data that could be correlated with another system, so users are left to collate the information themselves, such as bringing up two applications and possibly even copying and pasting into a spreadsheet or third data tool to combine the data they need.
Add to this that companies are using not only on-premise applications but adding in applications on the web, and users are left with even more disconnected data sources. As a data professional, from the Database Administrator to the Database Developer, you will be called on to stich these systems together to ensure that your users have access to what they need.
You may also be asked to work with a Hybrid Cloud Solution such as Windows and SQL Azure. To allow the power and flexibility of these solutions but migrate slowly or perhaps to keep private and secure data locally, you'll need a way to connect to two or more data sources at once within a single user interface. In this tutorial I'll show you a few options on how to do that.
Like everything else in computing, there is no single solution to a set of requirements. It's important when you think about any of these options that they fit the particular situation at hand. Evaluate the needs first, and then layer the solutions and find the one that fits best. It's not uncommon or incorrect to use any or all of a mix of these solutions, especially across the entire application set you have to work with. You'll find one solution works best for this application, a different one for another application.
As I explain each of these options, remember that within your application requirements, there are other vectors to think about when connecting to multiple data sources: security, performance, and data types to name the more important ones. Each system has different ways of handling these items, and you need to understand what those impacts are, especially in the security area. Many code break-ins are a result of a "weak link" of combining more than one data source.
There are two main ways to access data on multiple servers: using a Data Access Layer (DAL) or using Distributed Queries originated from SQL Server. In this tutorial I'll show you each method, which have their own strengths and weaknesses.
Whether you're a DBA or a Developer, read through both of these sections to understand how each works, and then you can work with your administrator or developer counterpart to come up with the best solution.
For this tutorial, I'll use the standard Purchase Order example to illustrate. Purchase orders have a "header" set of information that doesn't change very frequently, and a "details" area that holds all of the information for a particular order. These are usually in at least two sets of tables in a Relational Database Management System (RDMBS) and are tied together with a Primary Key. In this simplified example, I'll just use a company "number" as the key that ties these together.
Using A Data Access Layer or "Middle Tier"
This process is called a Data Tier or more commonly a Data Access Layer (DAL). The basic idea in any of these is that you replace the direct connection to a database from a client to a central program that accesses data for the client. The presentation client then accesses the DAL to return data back to the screen. It basically moves the query from the client to another program.
When you use this approach, normally you're converting a data structure to a programming Class. This is an Object-Oriented view of data. Classes are Objects that can hold properties (in many ways similar to a column in a database table) or perform actions such as set (write) and get (read).
To bring several datasets together into a single object, the general process is to set up a Class that holds the data structure — this is what is returned to the calling program. The class is made up of a connection to the various datasets, joined on a common key between them. So basically you set up something called "A," connect to "B" and "C," and put the data they hold in "A." The client program then works from "A" as if it were a single object. This is similar to what happens in a standard SQL Join, with the exception that it isn't done in the SQL language and the data sources can be virtually anything.
You might wonder why you wouldn't just solve every data integration problem this way — and some developers do. There are some issues with this approach, most notably Data Integrity, Performance, and Data Type mismatches.
Just because two datasets share a distinct value may not indicate that you can join them together. Using the simple example of the Purchase Order, it's not always true that just because a dataset that has a company number as a field in a table, that particular number is always the right one when you perform a programmatic join to another dataset, perhaps in an XML document, that has a company number in a Details line items set of data. So you can see that the first danger — and it is a big one — is that data might be joined without regard to ensuring that no "false connections" are made. This issue gets more problematic the more data sets you add in. Just as in electronics, just because you can plug something in doesn't mean you won't end up with a nasty shock.
Assuming that you have ensured the data sets maintain integrity between them, you also need to consider performance. You might have one of the most extremely tuned database systems around, but if that is joined in a program to a huge text file accessed over an Internet connection, the data is returned at the combination rate of both of them. There are ways to mitigate this issue, but it is something to consider.
Data type mismatches are a real issue with this method. It's almost certain that data will be returned from different systems using different data types, and even a small difference can be catastrophic. You can mitigate this issue by carefully examining how each data source handles a data type and casting or converting it in the code to a common, consistent format and numeric handling.
Although definitely not the last thing you should consider but the most important, security is a key consideration when accessing data from a DAL. The DAL should not allow more data access than is completely necessary for a given operation. Once again, understand the data security in each source, and ensure that your code does not allow access to more than absolutely necessary. Also, perform a full risk analysis on your code to ensure that it is secure from any attacks that might allow an unauthorized access to that authorized data.
Reading a combined dataset is relatively simple. However, you have to handle adding, editing or deleting data separately. Normally you capture the changes on the form the user edits, and then perform a set operation back through each dataset individually. In essence your code handles the transaction handling normally done by the RDBMS engine. That's not a problem, it just needs to be thought through carefully. There are constructs, especially within .NET Framework Libraries, to help you handle this.
With all of those caveats, you can access data from multiple sources in code quite simply. I'll stick with the .NET constructs in this tutorial as an example, although there are several examples here at InformIT and elsewhere for Java, C++ and other languages.
Using a .NET Data Adaptor DataSet
The first and most simple method is to use a .NET DataSet object. You set up two (or more) connections, query the data between them, set up a joining operation to combine them into a third object that acts as the aggregated single source. Here's a C# example of that process:
//Set up two connections, more are possible SqlConnection companyConnection = new SqlConnection("Data Source=FirstServer;Integrated Security=SSPI;Initial Catalog=PurchaseOrders"); SqlConnection lineItemConnection = new SqlConnection("Data Source=SecondServer;Integrated Security=SSPI;Initial Catalog=PurchaseOrders"); SqlDataAdapter headerAdapter = new SqlDataAdapter("SELECT * FROM Company", companyConnection); //Note: if this is to be a different source such as Oracle or text files, set up a different //connection information above and possibly a different connection below SqlDataAdapter lineitemAdapter = new SqlDataAdapter("SELECT * FROM LineItem", lineItemConnection); // This is the combined dataset DataSet companyLineItems = new DataSet(); // Go get data from each dataset from above headerAdapter.Fill(companyLineItems, "companies"); lineitemAdapter.Fill(companyLineItems, "lineItems"); // Join them DataRelation relation = companyLineItems.Relations.Add("CustLineItems", companyLineItems.Tables["companies"].Columns["companyID"], companyLineItems.Tables["lineItems"].Columns["companyID"]); // Display the combined data foreach (DataRow pRow in companyLineItems.Tables["companies"].Rows) { Console.WriteLine(pRow["companyID"]); foreach (DataRow cRow in pRow.GetChildRows(relation)) Console.WriteLine("\t" + cRow["LineItem"]); }
Using Language Integrated Query (LINQ)
Another method is to use Language Integrated Query (LINQ). LINQ is a single data query language used in .NET programming languages. It's completely integrated into the .NET framework, so developers use this construct quite often. I have an entire article on this technology here, so I won't repeat that information in this article.
Here's a snippet of an example of two text files in C# (the SQL Server examples are longer, to be more clear I'm just using text files here):
class PurchaseOrder { public string CompanyName { get; set; } public string CompanyAddress { get; set; } public int ID { get; set; } public List<int> LineItems { get; set; } } class PopulateCollection { static void Main() { string[] POHeaders = System.IO.File.ReadAllLines(@"../../../POHeaders.csv"); // File example, Bucksoft,1234 Mockingbird Lane,1001 string[] LineItems = System.IO.File.ReadAllLines(@"../../../LineItems.csv"); // File example, 1001, Product 234, Product 237, Product 341, Product 795 // Merge the data sources - This is the main part of the work IEnumerable<PurchaseOrder> queryNamesLineItems = from nameLine in POHeaders let splitName = nameLine.Split(',') from itemLine in LineItems let splitLineItemLine = itemLine.Split(',') where splitName[2] == splitLineItemLine[0] select new PurchaseOrder() { CompanyName = splitName[0], CompanyAddress = splitName[1], ID = Convert.ToInt32(splitName[2]), LineItems = (from itemAsText in splitLineItemLine.Skip(1) // Do not show the key again select Convert.ToInt32(itemAsText)). ToList() }; List<PurchaseOrder> companys = queryNamesLineItems.ToList(); // Display each company's name and items foreach (var company in companys) { Console.WriteLine("The items are {0} {1} {2} {3} {4}", company.CompanyName, company.CompanyAddress, company.LineItems); } //Keep console window open in debug mode Console.WriteLine("Press any key to exit."); Console.ReadKey(); } }
This example is similar to one in Microsoft documentation on LINQ, and there is a full explanation of that code here: http://msdn.microsoft.com/en-us/library/bb513866.aspx
Using Entity Framework
Entity Framework (EF) is a new form of working from a model-first development mechanism, and I've described it further here. I won't show a full example of this kind of query, primarily because it requires not just code. You'll modify control files for the application to add the various providers to be able to call them in code. Because it's more a more complex configuration, and since I'm trying to focus more on a data professional audience than a developer one, I'll point you to this article which has more information on this process in specific, and to these articles for more general EF information. There's also an excellent book here to learn more as well.
Using Distributed Queries
Instead of (or even in combination with) using a Data Access Layer, you can have SQL Server connect to other datasets to bring data together in a single query or Stored Procedure, Function, or View. Microsoft calls this a Distributed Query, and you can learn more about how they view this process here: http://msdn.microsoft.com/en-us/library/ms188721.aspx
Using this method you're able to handle some of the performance and data type issues, but even so there are issues to consider. There's still the security angle, and the simple fact is that querying across multiple data sources takes more time than from one single source. It also gets more difficult to create dissimilar joins using the SQL Server when you add more and more sources into the mix.
Four Part Queries
The simplest way to access more than one data set is to qualify the name in four parts in the query:
server.database.schema.object
While it's simple, there are quite a few caveats. First, if you want to use the server name part, you'll have to set up a linked server — which I'll talk about in a moment. So I'll drop the server name for now —
database.schema.object
Now you can see that the other data source actually has to be another database on the same server, which implies of course that your other data source is on SQL Server, and most specifically that it is on the same Instance of SQL Server. Also, from the security standpoint you have to allow "Cross Database Ownership Chaining," a very dangerous thing to do.
With all of those caveats, I set my test instance of SQL Server to allow Cross Database Ownership Chaining and ran this code:
/* Super uber-brain-dead example - cross database ownership must be enabled which can be evil */ USE master; GO CREATE DATABASE TestPurchaseOrderDatabaseA; GO CREATE DATABASE TestPurchaseOrderDatabaseB; GO /* Set up the first database to hold the company data */ Use TestPurchaseOrderDatabaseA; GO CREATE TABLE PurchaseOrderHeader (CompanyID int PRIMARY KEY ,CompanyName Varchar(50)); GO INSERT INTO PurchaseOrderHeader VALUES (1, 'Awesome Examples RUs'); GO INSERT INTO PurchaseOrderHeader VALUES (2, 'Security Is Overrated, Inc.'); GO /* Set up the second database to hold the details data */ Use TestPurchaseOrderDatabaseB; GO CREATE TABLE PurchaseOrderDetail (LineID int PRIMARY KEY , CompanyID int ,LineItem Varchar(50)); GO INSERT INTO PurchaseOrderDetail VALUES (1,1, 'Distributed Query Example 1'); GO INSERT INTO PurchaseOrderDetail VALUES (2,1, 'Distributed Query Example 2'); GO INSERT INTO PurchaseOrderDetail VALUES (3,2, 'Let Me In'); GO INSERT INTO PurchaseOrderDetail VALUES (4,2, 'My Password is 1234'); GO /* Now put it all together */ SELECT A.CompanyName, B.LineItem FROM TestPurchaseOrderDatabaseA.dbo.PurchaseOrderHeader as A INNER JOIN TestPurchaseOrderDatabaseB.dbo.PurchaseOrderDetail as B ON A.CompanyID = B.CompanyID /* Clean Up, Clean Up, everybody do your share */ Use master; GO DROP DATABASE TestPurchaseOrderDatabaseA; GO DROP DATABASE TestPurchaseOrderDatabaseB; GO
Once again, only under rare circumstances is this a good idea.
OPENROWSET
If you need to access data from a different database or even a different server and server type than SQL Server, you can use the OPENROWSET call in a SQL Server query. You will need to supply the type of connection you want to make, send the query, and once again join the results. Here's an example where I'll use the same table layout, but this time I'll put the details in a text file. The instructions if you want to try this are in the comments.
/* Use OPENROWSET to connect a text file to a database. First, you'll need a text file called LineItems.txt in c:\temp that has this data in it: 1 Distributed Query Example 1 1 Distributed Query Example 2 2 Let Me In 2 My Password is 1234 Now create another text file in c:\temp called LineItems.fmt that has this data: 9.0 2 1 SQLCHAR 0 10 "\t" 1 ID SQL_Latin1_General_Cp437_BIN 2 SQLCHAR 0 40 "\r\n" 2 Description SQL_Latin1_General_Cp437_BIN */ USE master; GO CREATE DATABASE TestPurchaseOrderDatabase; GO /* Set up the database to hold the company data */ Use TestPurchaseOrderDatabase; GO CREATE TABLE PurchaseOrderHeader (CompanyID int PRIMARY KEY ,CompanyName Varchar(50)); GO INSERT INTO PurchaseOrderHeader VALUES (1, 'Awesome Examples RUs'); GO INSERT INTO PurchaseOrderHeader VALUES (2, 'Security Is Overrated, Inc.'); GO /* Now put it together */ SELECT A.CompanyName, B.Description FROM PurchaseOrderHeader as A INNER JOIN OPENROWSET( BULK 'c:\temp\LineItems.txt', FORMATFILE = 'c:\temp\LineItems.fmt') AS B ON A.CompanyID = B.ID; GO
This method works well for well-defined OLE DB data sources, and there are quite a few of those. You can learn more about your options here: http://msdn.microsoft.com/en-gb/library/ms190312.aspx
But a simple examination of the Execution Plan for this query shows that it's quite a bit of time to "spin up" that remote query provider to access the data. While it works, it can be a bit slow, especially if you use it over and over. For that, you can use a Linked Server.
Linked Servers and OPENQUERY
A "Linked Server" is essentially a definition of another data source within SQL Server. The engine opens a channel to the other data source and allows you to query it.
To create a Linked Server, you can use graphical tools or the sp_addlinkedserver Stored Procedure. Once you connect to that server you can implement the four-part query I mentioned earlier. You can reference this link to locate the exact parameters you need, based on the type of data source you want to connect to. For instance, here is an example of connecting to an Oracle server called by the SQL*Net name of PurchasingServer, and setting that name to TestPurchaseOrderDatabaseB in SQL Server:
EXEC sp_addlinkedserver @server = N' TestPurchaseOrderDatabaseB', @srvproduct = N'Oracle', @provider = N'MSDAORA', @datasrc = N'PurchasingServer'; GO
Now you can reference the server in a SQL Server by the four part name like this:
SELECT * FROM TestPurchaseOrderDatabaseB.dbo.PurchaseOrderDetails; GO
And use it just as I showed earlier for the three-part name. While Linked Servers simplify connections, they can be problematic. Connection parameters (like having the proper Oracle Client installed on the server) and so on can grow quickly. Also, performance can suffer because of the methods that the engine chooses to query the other server.
To help with the second issue, you can create a "pass-through" query. This construct sends the query directly to the other data system, which will theoretically process it faster than SQL Server will, and return a data set quicker. To create a pass-through query after you create your Linked Server you can use the OPENQUERY command.
Essentially you treat this command as a sub-SELECT operation, like this:
SELECT * FROM OPENQUERY(TestPrucahseOrderDatabaseB, 'SELECT ID, Description FROM PruchaseOrderGroup.LineItems') ; GO
This can be an interesting exercise when you want to edit or delete the data — read the full description of that process here: http://msdn.microsoft.com/en-us/library/ms188427.aspx
While Linked Servers are very simple to implement, I find them quite "brittle," meaning that they malfunction often. Also, it is quite frustrating to learn that the query you've been battling with for a while is on a Linked Server — look for those four-part-names if you see a misbehaving query.
Security can also be an issue. Once the server is linked, it carries the security of the code, not necessarily the user.
OPENDATASOURCE
If you do need the power and convenience of a Linked Server, it's often a better choice to use the command. This command allows you to use a four-part-name in a query without having to create a Linked Server first. It's most useful with another SQL Server, because you can send along the Windows Authentication that the SQL Server Service is using. If you use another provider, like Oracle, you need to protect this code because you may have to send along your name and password — something you don't want in visible code. I have also used this with "security-less" systems like Excel spreadsheets, like this:
SELECT * FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0' ,'Data Source=C:\temp\POLineItems.xls;Extended Properties=EXCEL 5.0')...[Sheet1$] ;
As you can see, you have quite a few options for connecting to multiple sources of data. My personal take is that a programmatic method is often the most secure and best performing option, but that assumes that the developers have done their homework on making that true. Each situation is different, and knowing more about what options you have is the key.