SQL Server Change Data Capture (CDC) Basics

Microsoft SQL Server

Change Data Capture (CDC) was first introduced in SQL Server 2008, and is a tool that is used to keep track of data changes (DML statements) in tables. CDC must be enabled on the database level, then on the table level, and lastly on the column level (optional).

When CDC is enabled on a table, a new similar table is created, but under the cdc schema. In the new table all changes to to base table are recorded.

Traditionally, triggers have been created to track changes in tables. But CDC is a more light weight technology since the transaction log is read asynchronously for changes. Triggers are part of the original transaction.

CDC is only supported in the Enterprise or Developer editions of SQL Server.

Read on to see a demonstration of CDC basics.

CDC Demo

In the following script I’ll demonstrate the CDC basics. First, create a database and a table:

CREATE DATABASE CDCTest
GO

USE CDCTest
GO

CREATE TABLE dbo.CDC
	(
		Pk INT IDENTITY(1,1) NOT NULL CONSTRAINT CDC_Pk PRIMARY KEY CLUSTERED
		,TextValue VARCHAR(20) NOT NULL
		,DateChanged DATETIME2 DEFAULT(GETDATE()) NOT NULL
	)
GO

To enable CDC on the current database, use the system procedure sys.sp_cdc_enable_db:

EXEC sys.sp_cdc_enable_db 
GO

To see which databases has CDC enabled, the system table sys.databases can be used:

SELECT [name], is_cdc_enabled FROM sys.databases ORDER BY is_cdc_enabled DESC
CDC_is_cdc_enabled

CDC_is_cdc_enabled

To enable CDC on an individual table, use the system procedure sys.sp_cdc_enable_table:

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'tblCDC', @role_name = NULL, @captured_column_list = NULL
GO

@source_schema is the schema the table belongs to.
@source_name is the table name.
@role_name is used if access to the changed data needs to be restricted by a role. The parameter is mandatory , so NULL is used to indicate that there is no restriction in access to changed data.
@captured_column_list the default is NULL so the parameter can be left out. But this parameter is used to restrict the columns that are included in the change table. NULL means all columns are included.

Job ‘cdc.CDCTest_capture’ started successfully.
Job ‘cdc.CDCTest_cleanup’ started successfully.

As the results from executing sp_cdc_enable_table indicate, two separate SQL Agent jobs are created to handle CDC:

CDC SQL Agent Jobs

CDC SQL Agent Jobs

There is also a new table created, cdc.dbo_tblCDC_CT, that will hold all changes in the base table dbo.tblCDC:

CDCTable_CT

CDCTable_CT

Note that the new CDC table is located under System Tables.

The new CDC table has all the columns of the base table (if @captured_column_list was set to NULL), and 5 additional columns:

CDC Table Columns

CDC Table Columns

The first three, __$start_lsn, __$end_lsn and __$seqval are references to positions in the transaction log and will not be covered in this blog post.

__$operation can be one of the following values: 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
__$update_mask is a bit mask showing which columns was updated.

CDC example

Let’s insert a value into the demonstration table created earlier, dbo.tblCDC:

INSERT dbo.tblCDC (TextValue) VALUES ('TestValue1')

There will now be a corresponding row in the cdc.dbo_tblCDC_CT table:

SELECT * FROM cdc.dbo_tblCDC_CT
CDC Insert Base Table

CDC Insert Base Table

As can be seen in the picture above, the operation is = 2, which is an insert.

If we update the value in the base table:

UPDATE dbo.tblCDC SET TextValue = 'TestValue2' WHERE Pk = 1

The CDC table will have the following contents:

SELECT * FROM cdc.dbo_tblCDC_CT
CDC Update Base Table

CDC Update Base Table

Because of the update, two new rows were created in the CDC table. One with the operation = 3 (old value) and one with the operation = 4 (new value).

When a row is deleted from the base table:

DELETE FROM dbo.tblCDC WHERE Pk = 1

The CDC table has a new row:

SELECT * FROM cdc.dbo_tblCDC_CT
CDC Delete Base Table

CDC Delete Base Table

The operation on the new row is = 1 (delete).

CDC table cleanup

The rows in the CDC tables are removed after a configurable time interval. The default is 3 days. This value can be configured with the sys.sp_cdc_change_job system stored procedure.

CDC compared to Change Tracking

Change Tracking is another tool in SQL Server that keeps track of changes in tables. The major differences between CDC and Change Tracking are:

* CDC uses an asynchronous process to read from the transaction log. Change Tracking captures changes as part of the original transaction. As a consequence, there may be a small delay before the changes appear in the CDC table.
* CDC keeps history, Change Trancking only keeps track of rows that has been changed. As a consequence, CDC uses more disk space than Change Tracking.

2 comments » Write a comment

  1. Pingback: CDC Cleanup job - Tomas Lind

  2. Pingback: CDC Capture job - Tomas Lind

Leave a Reply

Required fields are marked *.