SQL Server Transactional Replication Agents
In my previous articles, I showed you how to configure, maintain, and troubleshoot transactional replication. This article digs a little deeper into issues associated with transactional replication agents. I'll discuss replication agents' properties as well as profiles. You'll learn how replication agents work under the hood and how you can tweak their parameters for performance tuning and troubleshooting.
Transactional replication is served by three agents. First, the snapshot agent takes the snapshot of the replicated articles (tables, views, stored procedures or user-defined functions) on the publisher. When you replicate tables, the snapshot agent creates the file containing the schema of the table, including any constraints, triggers, and indexes. The snapshot agent also creates the snapshot of any data contained in the tables. When you replicate non-table articles, the snapshot agent only scripts the schema of replicated objects.
The log reader agent scans the transaction log of the published database for any transactions affecting replicated objects; when such transactions are found, the log reader agent moves them into the distribution database.
The distribution agent reads replicated transactions in the distribution database and applies those transactions to the subscriber(s).
Let's look under the covers and see what each of these agents does more closely.
Snapshot Agent
You can examine the details of each replication agent by navigating to the Replication Monitor, selecting the publication you're interested in, right-clicking any of the agents, and choosing Agent Properties. A dialog box displays that shows you the properties of the SQL Server job associated with a particular agent, as shown in the following figure.
If you examine the Steps tab of the snapshot agent job, you'll see that this agent is implemented by three steps. First, the agent runs the sp_MSadd_snapshot_history system stored procedure. This procedure is usually executed with the following parameters:
EXEC sp_MSadd_snapshot_history @perfmon_increment = 0, @agent_id = 10, @runstatus = 1, @comments = 'Starting agent.'
The @perfmon_increment parameter is used to update replication performance counters for the agent associated with @agent_id; the latter is a reference to the id column in the MSsnapshot_agents table. Generic performance counter SQLServer:Replication Agents displays all replication agents that currently run on the server, whereas Snapshot:Delivered Cmds/sec and Snapshot:Delivered Trans/sec counters report the average number of commands and transactions delivered by the snapshot agent. In addition to updating the values of these performance counters, sp_MSadd_snapshot_history also records the history of the snapshot agent in the MSsnapshot_history table in the distribution database.
Not surprisingly, the @runstatus parameter determines the status of the snapshot agent; the following table lists the possible values for this parameter:
@runstatus Value |
Meaning |
1 |
The agent is starting up |
2 |
The agent has succeeded |
3 |
The agent is in the process of executing |
4 |
The agent is idle |
5 |
The agent has encountered errors and is retrying its operations |
6 |
The agent has failed |
Finally the @comments parameter determines the message that is displayed in Enterprise Manager in the Last Action column of the snapshot agent.
So the first step simply updates performance counters and displays the snapshot agent's startup message. The next step is called Run Agent; this is the step that actually executes the snapshot agent. Here is the typical command for running the snapshot agent:
-Publisher [D10ZF411] -PublisherDB [pubs] ' -Distributor [ D10ZF411] -Publication [pubs1] -DistributorSecurityMode 1
This command advises the snapshot agent to use the D10ZF411 server both as publisher and distributor, to read the pubs1 publication within the pubs database, and to impersonate the SQL Server agent account when connecting to the publisher (security mode = 1). The security mode of 0 would advise the snapshot agent to use a SQL Server login for connecting to the publisher.
If the second step runs successfully, the entire job completes reporting success. If the second step fails, the snapshot agent must have encountered problems, so the last step (detect non-logged agent shutdown) is activated. This last step of the snapshot agent runs the sp_MSdetect_nonlogged_shutdown stored procedure within the distribution database. The sp_MSdetect_nonlogged_shutdown procedure is executed if the snapshot, distribution, or log reader agent encounters problems. This procedure attempts to detect the cause of the agent's failure and reports an appropriate message within the Replication Monitor, as well as within the job's history.
You can examine the job's history by right-clicking the job within Enterprise Manager and choosing View Job History. This applies to any SQL Server job, not just replication jobs. For example, the snapshot agent's job is likely to have a history similar to the following figure.
Now that we've examined the details of the snapshot agent, we can learn more about the profile the runs this agent. Recall from my earlier article that the agent profile simply specifies the value of the parameters used for calling the replication agent.
The default values of the snapshot agent are explained in the following table.
Parameter Name |
Default Value |
Explanation |
BCPBatchSize |
100000 |
Number of rows read by the snapshot agent in a single batch. Also the number of rows read/written before progress is reported within the Replication Monitor. |
HistoryVerboseLevel |
2 |
Amount of history logged. 0 is the least amount; 3 is the most. |
LoginTimeout |
15 |
Number of seconds the agent will attempt connecting to the publisher before timing out. |
MaxBCPThreads |
1 |
Number of threads a snapshot agent can use for scripting data and schema. A higher number of BCP threads typically means more parallel operations. |
QueryTimeout |
300 |
Number of seconds the agent will attempt to query replicated objects before timing out. |
Query timeout and login timeout settings are fairly self-explanatory. If your agent can't log in to the server within 15 seconds, you obviously have a problem: Either your server is too busy to accept a new connection or you have specified invalid login credentials. If you can't run a particular snapshot agent query within 5 minutes (300 seconds), you're probably running the snapshot agent when the server is too busy performing other operations.
If possible, you should attempt to take the snapshot of the replicated objects during a period of little activity on the publishing server. If the publishing server must be available 24/7, and you can't afford lengthy downtime, you should consider alternative methods of delivering the initial snapshot. For example, you can simply back up the published database and restore it on the subscriber.
The history verbose level parameter is available for snapshot, distribution, and log reader agents. This parameter can either overwrite the existing values in the agent's history or create new records, thereby creating a more lengthy history. You should start with a detailed history logging to make sure that your replication setup works correctly. However, after you're certain that everything works as expected, you can get some performance advantage by reducing the history verbose level to 0.
The MaxBCPThreads parameter is available for snapshot agents as well as for distribution agents. This factor determines the number of parallel reads by the snapshot agent or parallel writes by the distribution agent when applying the snapshot at the subscriber. You can experience significant performance improvement on multiprocessor servers by bumping up the MaxBCPThreads option from its default value of 1. In my experience the MaxBCPThreads of 10 can work twice as fast as MaxBCPThreads of 1 on a 4-processor server. Note, however, that the number of parallel reads and writes you perform depends on the availability of CPU resources. If you have only a single CPU on your server, and CPU cycles are already tied up with other user activity, tweaking this option might not provide any performance benefit.
The BCPBatchSize parameter is the number of rows read or written by the snapshot or distribution agent within a single transaction and before the progress is reported within the Replication Monitor. If you have millions of rows to copy, you might want to increase the value of this parameterthis way the snapshot and distribution agents don't have to keep reporting progress and might work slightly faster.