Recently we were asked to implement auditing of OLTP database and that lead us to thinking of different solution in place. We came across many solutions from triggers, timestamps and complicated queries (stored procedures) to audit data.
I remember that SQL Server 2005 introduced the new features of ‘after update’, ‘after insert’ and ‘after delete’ triggers that almost solved the problem of tracking changes in data.
Fortunately I was asked to implement this on Sql Server 2008 and Microsoft has provided two nice features to achieve above tasks without additional programming for developers. This feature is known as “Change Tracking” and “Change Data Capture” aka CDC.
Since Change tracking system does not allow you to capture historical data and also it works synchronously so we opted CDC.
This BLOG covers all necessary steps to set up CDC, So i have majorly categorized it in following section:
1) What is CDC?
2) Why to Use CDC?
3) When to Use CDC?
4) How to Use CDC?
What is Change Data Capture (CDC)
In a simplified CDC context, one computer system has data believed to have changed from a previous point in time, and a second computer system needs to take action based on that changed data. The former is the source, the latter is the target. It is possible that the source and target are the same system physically, but that does not change the design patterns logically.
In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. Also, it is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources.
Change Data Capture records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables, and makes a record available of what changed, where, and when, in simple relational change. These change tables contain columns that reflect the column structure of the source table you have chosen to track, along with the metadata needed to understand the changes that have been made. When you apply Change Data Capture features on a database table, a mirror of the tracked table is created with the same column structure of the original table, but with additional columns that include the metadata used to summarize the nature of the change in the database table row.
Why to Use CDC
Why one should use CDC? There are many ways to achieve CDC like feature. Microsoft has already provided ‘after update’, ‘after insert’ and ‘after delete’ triggers that almost solved the problem of tracking changes in data. As we all know triggers are always costly and are synchronous. We can also achieve it by implementing TimeStamp or writing some stored procedures to handle data changes.
CDC feature of Sql Server 2008 gives us leverage to capture data changes asynchronously without using any trigger or timestamp.
When to Use CDC
CDC solutions occur most often in data-warehouse environments since capturing and preserving the state of data across time is one of the core functions of a data warehouse, but CDC can be utilized in any database or data repository system for the audit purpose.. If user wants to track data level changes then CDC can be used whereas if user needs to capture at row level changes then it can be achieved by change tracking mechanism provided by Sql Server 2008. So CDC can be used for capturing historical changes of data.
·CDC can be used when low latency is required as it synchronizes the changes asynchronously.
·It can be used when there is no need of programming changes to the application that use the database.
·CDC can also be used when transactional integrity is required.
·CDC can also be used when no need to change the database schema.
How to use Change Data Capture (CDC)
There are certain steps to follow for creating CDC in database.
There are different functions, stored procedures, tables and jobs involved for CDC implementation.
CDC Implementation - Prerequisites
·SQL Server Agent Service should be running in order to get notified of the CDC actions
·Enable CDC for database by a member of the sysadmin fixed server role
·Enable CDC for individual table by the members of the db_owner fixed database role
Steps:
CDC first has to be enabled for the database. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked.
1) Create the Database, if not exists
USE master
GO
IF NOT EXISTS(SELECT 1 FROM sys.databases WHERE name = ‘TestCDC’)
CREATE DATABASE TestCDC
GO
2) Enable CDC for the newly created database TestCDC
USE TestCDC
GO
EXEC sys.sp_cdc_enable_db
GO
Once CDC has been enabled for a database, a new column is_cdc_enabled is added to sys.databases table.
To check whether the CDC has been enabled for a database:
SELECT is_cdc_enabled
FROM sys.databases
WHERE name = ‘TestDb’
It should return 1, if enabled; 0 otherwise.
Besides this, a new schema named ‘cdc’ and a new user named ‘cdc’ will be created.
Also, following tables will be created. (related to CDC). These tables are populated when the first table is enabled for CDC.
cdc.captured_columns – This table returns result for list of captured column.
cdc.change_tables – This table returns list of all the tables which are enabled for capture.
cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
cdc.index_columns – This table contains indexes associated with change table.
cdc.lsn_time_mapping – This table maps Log Sequence Number (LSN) number .
3) Create a table, if not exists
CREATE TABLE Users
(
UserID INTEGER IDENTITY (1,1)
, UserName VARCHAR(100)
, UserLocation VARCHAR(100)
CONSTRAINT [PK_User_UserID] PRIMARY KEY CLUSTERED (UserID ASC)
)
GO
4) Enable CDC for the table
USE TestCDC
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Users’,
@role_name = NULL,
@captured_column_list = ‘[UserID],[UserName]‘
GO
The contents of the abovementioned tables are as below (after enabling CDC for Employee table) –
Since I am using TestCDC database, it creates the jobs with following names. when these jobs are created they are automatically enabled as well.
· cdc.TestCDC_capture – When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job. The procedure sys.sp_cdc_scan is called internally by sys.sp_MScdc_capture_job. This procedure cannot be executed explicitly when a change data capture log scan operation is already active or when the database is enabled for transactional replication. This system SP enables SQL Server Agent, which in facts enable Change Data Capture feature.
· cdc.TestCDC_cleanup – When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables.
When a table has CDC enabled, 2 new fucntions are created
· fn_cdc_get_all_changes_ dbo_Users_CT
· fn_cdc_get_net_changes _dbo_Users_CT (Note: this function will be created if @supports_net_changes = 1 while enabling CDC for the table)
default, all the columns of the specified table is taken into consideration of this operation. If you want to only few columns of this table to be tracked in that case you can specify the columns as one of the parameters of above mentioned SP.
SELECT * FROM cdc.captured_columns
· If a table is CDC enabled, and If you make any schema changes to the table, you need to disable CDC at the table level and then enable again in order to capture the data changes for the new columns
When everything is successfully completed, check the system tables again and you will find a new table called cdc.dbo_Users_CT. This table will contain all the changes in the table dbo.Users. If you expand this table, you will find five additional columns as well.
There are two values which are very important to us are __$operation and __$update_mask.
Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.
· Delete Statement = 1
· Insert Statement = 2
· Value before Update Statement = 3
· Value after Update Statement = 4
The column _$update_mask shows, via a bitmap, which columns were updated in the DML operation that was specified by _$operation. If this was a DELETE or INSERT operation, all columns are updated and so the mask contains value which has all 1’s in it. This mask contains value which is formed with Bit values.
Understanding Update mask
It is important to understand the Update mask column in the tracking table. It is named as _$update_mask. The value displayed in the field is hexadecimal but is stored as binary.
If operations are listed marked masked with 0×1F is translated in binary as 0b11111, which means all the five columns of the table.
If UPDATE on only two columns – the second and fifth column then this is represented with 0×12 in hexadecimal value ( 0b10010 in binary). Here, this value stands for second and fifth value if you look at it from the right, as a bitmap. This is a useful way of finding out which columns are being updated or changed.
The tracking table shows two columns which contains the suffix lsn in them i.e. _$start_lsn and _$end_lsn. These two values correspond to the Log Sequential Number. This number is associated with committed transaction of the DML operation on the tracked table.
5) Disabling CDC for the table.
As we have seen earlier, if we have to enable CDC we have to do this in two steps – at table level and at database level, In the same way, when we have to disable this feature, we can do this at same two levels.
For dropping any tracking of any table we need three values the Source Schema, the Source Table name, and the Capture Instance. It is very easy to get schema and table name. In our case, the schema is dbo and table name is Users, however we do not know the name of the Capture Instance. We can retrieve it very easily by running following T-SQL Query.
Use TestCDC
Go
EXEC sys.sp_cdc_help_change_data_capture
GO
Once this is run we get capture instance as “dbo_Users”. Once we have name of capture instance we can disable tracking of the table by running following TSQL query:
USE TestCDC;
GO
EXECUTE sys.sp_cdc_disable_table
@source_schema = N’dbo’,
@source_name = N’Users’,
@capture_instance = N’dbo_Users’;
GO
Once Change Data Capture is disabled on any table, it drops the change data capture table as well all the functions which were associated with them. It also deletes all the rows and data associated with this feature from all the system tables and changes relevant data in catalog views.
6) Disabling CDC from database
Running following T-SQL query will disable CDC on whole database.
USE TestCDC;
GO
EXEC sys.sp_cdc_disable_db
GO
This Stored Procedure will delete all the data, functions, tables related to CDC. If this data is needed for any reason, you must take a backup before dropping CDC from any database.
7) Testing Change Data Capture
I performed some Data Modifications (Inserts and Updates) on the main table. You can see that these opetations are entered into the change table automatically.
Querying the change data from the change table:
DECLARE @begin_time datetime
DECLARE @end_time datetime
DECLARE @from_lsn binary(10)
DECLARE @to_lsn binary(10);
declare @rowfilter varchar(10)
SET @rowfilter = ‘all’
SET @begin_time = DATEADD(hh,-1,GETDATE()); –going 1 hr back
SET @end_time = GETDATE();
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn(’smallest greater than or equal’, @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn(‘largest less than or equal’, @end_time);
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Users(@from_lsn,@to_lsn,‘all’)
Above result shows two Inserts with Name “John” and “Johny” for User ID 3 and 4 respectively, Third row shows the update of User Name “Johny” to “Nathan”.
We have used relational operations in the function sys.fn_cdc_map_time_to_lsn. There can be total of four different relational operations available to use in that function:
· largest less than
· largest less than or equal
· smallest greater than
· smallest greater than or equal
This way captured data can be queried very easily and query based on time interval.
Automatic Clean Up Process
If we track every change of all the data in our database, there is very good chance that we will outgrow the hard drive of main server. This will also lead to issues with maintenance and input/output buffer issues.
In CDC there is automatic cleanup process that runs at regular intervals. By default the interval is of 3 days but it can be configured. We have observed that, when we enable CDC on the database, there is one additional system stored procedure created with the name sys.sp_cdc_cleanup_change_table which cleans up all the tracked data at interval.
Performance Recommendation
· When planning change data capture architecture, take a significant increase in log size and log volume I/O operations into account.
· Consider changing the default filegroup for the database before you execute sys.sp_cdc_enble_db so that change data capture metadata and especially cdc.lsn_time_mappings are located on a different filegroup than PRIMARY.
· Try to avoid scenarios where a row needs to be updated immediately after insert.
· Try to avoid using change data capture to capture changes to tables that have frequent large update transactions.
· Always reduce the list of columns captured by change data capture to only the columns you really need to track.
· If possible, run cleanup when there is no other workload active.
Advantage:
· Avoids the complexity of writing triggers, time stamp columns and complex joins.
· Has better performance than earlier mechanism at it does not directly work with source table.
· Works asynchronously on the logs to capture the changes in tracking table.
· Provides an automatic way of purging the tracking tables.
What one should also know:
· It gives historical information on the changed table and hence storage requirement is more.
· Same as replication, You cannot use truncate option on which CDC is enabled.
· Additional maintenance for the capture, clean up jobs on the change tracking table.
Popularity: 63% [?]





March 2nd, 2011 at 3:19 am
This is such a great blog!. It saved me a life time of work. Thank you and please continue writing more
March 2nd, 2011 at 8:28 am
Very well detailed and the examples makes it easy to follow. Would be expecting next blog in this series eagerly
March 4th, 2011 at 8:29 am
Good article Sir..
Please post some more useful articles on SSRS.
March 8th, 2011 at 11:03 am
Excellent article on CDC - must for DBAs. Detailed explanation of the feature along with step by step instructions to enable/get it working. Looking forward to next article on CDC fine tuning. Keep up the good work.
March 28th, 2011 at 12:55 am
Really very good article with step by step details of how to use CDC. Very helpful for those who are new to CDC. Keep on posting such articles.
April 5th, 2011 at 2:14 pm
Wow..now you sounds like scientist
Good article…keep up the good work
Cheers !!!!!!!!
May 25th, 2011 at 1:40 pm
for over 6 years I have been waiting for native data change/audit support.
yay!
FINALLY
Great article, thanks very much!
November 1st, 2011 at 1:40 pm
There´s a way to know who changed that data? Because CDC give us what data has changed, but who make the change?
December 2nd, 2011 at 12:40 pm
Marcio:
I\’ve been working on that exact problem with no solution. You can add a column to the CDC table and default the user to ORIGINAL_LOGIN but it enters the grantor and not the user. I\’ve tried a trigger on the CDC table but that seems to stop CDC from working altogether. The info is available via a convoluted set of joins, but without a working trigger it\’s useless. Besides writing a trigger defeats the purpose of CDC. MS needs to just add this parameter which comes from sys.dm_exec_sessions.
December 16th, 2011 at 8:25 pm
Gr8 Article Sir!!! Its good to have new and unique things regarding database side.This is really good for me,do post such type of articles..
Thanks