An introduction to CDC can be found here.
When CDC (Change Data Capture) is enabled on a database, a SQL Agent job is automatically created called cdc.XXX_capture. The capture job reads asynchronously from the SQL Server transaction log to retrieve all modifications (DML statements) made to all CDC enabled tables. The changes are then written to a change table.
In this blog post I’ll describe the CDC capture job and how it can be configured.
The technology that the CDC capture job is utilizing is the stored procedure sp_replcmds. This is the same procedure that is used by SQL Server replication technology. This means that if both replication and change data capture is enabled, they can share information from this log reader, minimizing contention when reading from the log.
However, the SQL Agent job doesn’t call the procedure sp_replcmds directly. The procedure called from the SQL Agent job is sys.sp_MScdc_capture_job (also created when CDC is enabled). In turn, this procedure calls sys.sp_cdc_scan (also created when CDC is enabled).
sys.sp_cdc_scan has a few parameters that can be altered to change some aspects of how the capture job works. These parameters are read from the settings that are stored in the table dbo.cdc_jobs. To view the current settings in this table, use the script below:
SELECT * FROM msdb.dbo.cdc_jobs
To change these configuration values, the system stored procedure sys.sp_cdc_change_job can be used. Besides being used to configure capture jobs, this stored procedure is also used to configure capture jobs. The first parameter is @job_type that can be either “capture” or “cleanup”. In this case, it must be “capture”. The parameters @retention and @threshold concerns the cleanup job. Parameters @maxtrans, @maxscans, @continuous and @pollinginterval are used to configure the capture job.
The maxtrans configuration option set the maximum number of transactions to read from the transaction log in each scan cycle (see continuous). The default is 500. If this parameter is set to NULL, it is interpreted as no change.
Configures how many scans will be done to capture all rows from the transaction log. Each scan reads the number of transaction specified in the maxtrans option. The default is 10. 10 scans times 500 transactions means 5000 rows will be read at the most each time. If this parameter is set to NULL, it is interpreted as no change.
A bit where 1 means the job will run continuously (the default) or only one time (0). If the job is set to run continuously, maxtrans rows will be read from the log maxscans times. The job will then wait pollinginterval seconds and then read from the log again. If this parameter is set to NULL, it is interpreted as no change. Normally, 1 would be used. One time reads (0) are for testing purposes only and must not be used in a production environment since records in the transaction log will be kept active until CDC captures them.
The number of seconds between log scan cycles. The default is 5. This parameter is only used if continuous = 1. If this parameter is set to NULL, it is interpreted as no change. In effect, a WAITFOR is issued between reads from the log, and this 5 second wait is the main reason why you may have to wait a few seconds before the change is captured to the change table.
So, to run the procedure, the following script can be used (I’m using default values for the demonstration):
EXEC sp_cdc_change_job @job_type='capture', @maxtrans = 500, @maxscans = 10, @continuous = 1, @pollinginterval = 5
The script must be run from the CDC enabled database.
If you make changes to these settings, a restart of the capture job is required. To stop the capture job, use:
EXEC sys.sp_cdc_stop_job @job_type = 'capture'
And to start the job again, use:
EXEC sys.sp_cdc_start_job @job_type = 'capture'
The whitepaper Tuning the Performance of Change Data Capture in SQL Server 2008 discusses CDC performance in more depth.