Change Data Capture (CDC) was first introduced in SQL Server 2008, and is a tool that is used to keep track of data changes (DML statements) in tables. CDC must be enabled on the database level, then on the table level, and lastly on the column level (optional).
When CDC is enabled on a table, a new similar table is created, but under the cdc schema. In the new table all changes to to base table are recorded.
Traditionally, triggers have been created to track changes in tables. But CDC is a more light weight technology since the transaction log is read asynchronously for changes. Triggers are part of the original transaction.
CDC is only supported in the Enterprise or Developer editions of SQL Server.
Read on to see a demonstration of CDC basics.
CDC Demo
In the following script I’ll demonstrate the CDC basics. First, create a database and a table:
CREATE DATABASE CDCTest GO USE CDCTest GO CREATE TABLE dbo.CDC ( Pk INT IDENTITY(1,1) NOT NULL CONSTRAINT CDC_Pk PRIMARY KEY CLUSTERED ,TextValue VARCHAR(20) NOT NULL ,DateChanged DATETIME2 DEFAULT(GETDATE()) NOT NULL ) GO
To enable CDC on the current database, use the system procedure sys.sp_cdc_enable_db:
EXEC sys.sp_cdc_enable_db GO
To see which databases has CDC enabled, the system table sys.databases can be used:
SELECT [name], is_cdc_enabled FROM sys.databases ORDER BY is_cdc_enabled DESC
To enable CDC on an individual table, use the system procedure sys.sp_cdc_enable_table:
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'tblCDC', @role_name = NULL, @captured_column_list = NULL GO
@source_schema is the schema the table belongs to.
@source_name is the table name.
@role_name is used if access to the changed data needs to be restricted by a role. The parameter is mandatory , so NULL is used to indicate that there is no restriction in access to changed data.
@captured_column_list the default is NULL so the parameter can be left out. But this parameter is used to restrict the columns that are included in the change table. NULL means all columns are included.
[white_box]
Job ‘cdc.CDCTest_capture’ started successfully.
Job ‘cdc.CDCTest_cleanup’ started successfully.
[/white_box]
As the results from executing sp_cdc_enable_table indicate, two separate SQL Agent jobs are created to handle CDC:
There is also a new table created, cdc.dbo_tblCDC_CT, that will hold all changes in the base table dbo.tblCDC:
Note that the new CDC table is located under System Tables.
The new CDC table has all the columns of the base table (if @captured_column_list was set to NULL), and 5 additional columns:
The first three, __$start_lsn, __$end_lsn and __$seqval are references to positions in the transaction log and will not be covered in this blog post.
__$operation can be one of the following values: 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
__$update_mask is a bit mask showing which columns was updated.
CDC example
Let’s insert a value into the demonstration table created earlier, dbo.tblCDC:
INSERT dbo.tblCDC (TextValue) VALUES ('TestValue1')
There will now be a corresponding row in the cdc.dbo_tblCDC_CT table:
SELECT * FROM cdc.dbo_tblCDC_CT
As can be seen in the picture above, the operation is = 2, which is an insert.
If we update the value in the base table:
UPDATE dbo.tblCDC SET TextValue = 'TestValue2' WHERE Pk = 1
The CDC table will have the following contents:
SELECT * FROM cdc.dbo_tblCDC_CT
Because of the update, two new rows were created in the CDC table. One with the operation = 3 (old value) and one with the operation = 4 (new value).
When a row is deleted from the base table:
DELETE FROM dbo.tblCDC WHERE Pk = 1
The CDC table has a new row:
SELECT * FROM cdc.dbo_tblCDC_CT
The operation on the new row is = 1 (delete).
CDC table cleanup
The rows in the CDC tables are removed after a configurable time interval. The default is 3 days. This value can be configured with the sys.sp_cdc_change_job system stored procedure.
CDC compared to Change Tracking
Change Tracking is another tool in SQL Server that keeps track of changes in tables. The major differences between CDC and Change Tracking are:
* CDC uses an asynchronous process to read from the transaction log. Change Tracking captures changes as part of the original transaction. As a consequence, there may be a small delay before the changes appear in the CDC table.
* CDC keeps history, Change Trancking only keeps track of rows that has been changed. As a consequence, CDC uses more disk space than Change Tracking.
Pingback: CDC Cleanup job - Tomas Lind
Pingback: CDC Capture job - Tomas Lind