CDC Cleanup job

Microsoft SQL Server

An introduction to CDC can be found here.

All data modifications in a CDC enabled table are saved in a separate change table. But in order to limit the size of this change table, a SQL Agent job is automatically created, named cdc.XXX_cleanup, that deletes rows older than the configurable “retention value”. The job is created once per database at the moment when CDC is enabled.

The SQL Agent job calls the procedure sys.sp_MScdc_cleanup_job which accepts no parameters.

To configure the retention value, the system stored procedure sys.sp_cdc_change_job is used. Besides being used to configure cleanup jobs, this stored procedure is also used to configure capture jobs.

The first parameter is @job_type that can be either “capture” or “cleanup”. In this case, it must be “cleanup”. There are other parameters (@maxtrans, @maxscans, @continuous, @pollinginterval) that are only used to configure capture jobs that I won’t describe here. The two parameters @retention and @threshold concerns cleanup jobs however.

Retention

This value specifies, in minutes, how long rows in the change table are kept. If NULL is used as value, this is interpreted as “no change”. The default is 4320 minutes (or 72 hours / 3 days). The maximum value is 52494800 minutes (100 years).

Threshold

This parameter is used to limit the number of rows that can be deleted in a cleanup delete statement. The default is 5000 rows. If NULL is used as value, this is interpreted as “no change”. According to this whitepaper from Microsoft, it may be beneficial for cleanup performance to increase this value. Also, if you keep the default on this parameter and change more than 5000 rows per day, the change table will keep growing. So I strongly suggest that this value is evaluated and adjusted to a value suitable to your environment.

So, to run the procedure, the following script can be used (I’m using default values for the demonstration):

EXEC sp_cdc_change_job @job_type='cleanup', @retention = 4320, @threshold = 5000

Note that you must run the script from the CDC enabled database.

If you want to know the current values for retention and threshold, the table msdb.dbo.cdc_jobs can be queried:

 SELECT * FROM msdb.dbo.cdc_jobs
msdb.dbo.cdc_jobs

msdb.dbo.cdc_jobs

Note that there will be two rows for each database where CDC is enabled.

Cleanup schedule

The SQL Agent job is by default scheduled to run at 02.00 in the morning. Depending on the number of rows expected to be deleted and other server activity, this job should be scheduled to a time when system pressure is low. Also, if you want to cleanup with a lower retention than 24 hours (1440 minutes) than once per night, the schedule for this job needs to be scheduled more frequently.

Leave a Reply

Required fields are marked *.