Enforcement of the new GDPR will begin in May 2018. One of the cornerstones in GDPR is that an organization must be able to protect sensitive personal information. This can be achieved in many ways and at many different levels, from infrastructure (e.g. monitoring traffic) to protecting individual database records. To protect individual data records, some different techniques are available. Examples are encryption and masking. Masking can be done in a permanent way (e.g. actually replacing a name with random characters), or it can be done dynamically (not replacing the original data, but to mask information at query time). SQL Server 2016 ships with a capability to do this dynamic data masking, DDM, and in this article I’ll cover the basics.
Dynamic Data Masking Basics
DDM is dynamic because it does not alter the way the data is stored in the database. Instead, DDM will mask the results from a query. This makes DDM a great tool to use on existing databases because only the query results are affected. Existing application are therefore normally unaffected (beware of updates however, see below). Masking is applied on the column level, and is always performed unless the current user has the privileges to unmask the data.
Let’s get on with an actual example. Run the following T-SQL to create a demo database:
CREATE DATABASE DDMDemo GO USE DDMDemo GO CREATE TABLE DDM ( FirstName VARCHAR(100) MASKED WITH (FUNCTION = 'default()') NOT NULL, LastName VARCHAR(100) MASKED WITH (FUNCTION = 'partial(1, "XXXXX", 1)') NOT NULL, BirthDay DATETIME2 MASKED WITH (FUNCTION = 'default()') NOT NULL, IQ INT MASKED WITH (FUNCTION = 'default()') NOT NULL, Email VARCHAR(100) MASKED WITH (FUNCTION = 'email()') NOT NULL, Salary INT MASKED WITH (FUNCTION = 'random(1, 100)') NOT NULL ) GO INSERT DDM VALUES ('Tomas', 'Lind', '1971-12-03', 62, 'email@example.com', 50)
The columns in the example above demonstrates the four available masking functions:
The default masking function replaces strings with “xxxx”. There will be less than four x’s if the string is shorter than four characters. Numeric datatypes are shown as “0”. Date and time datatypes are shown as “1900-01-01 00:00:00.0000000”. The columns in the example that uses the default function are FirstName, BirthDay and IQ.
The partial function reveals a user defined number of characters in the beginning and in the end of the string. Characters in between are replaced with a user defined padding. In the previous example, the first and last characters are shown, but everything in between is masked with “XXXXX”. This function can only be used on string columns. In the example, the LastName column uses the partial function.
The email function shows the first character and then appends “XXX@XXXX.com”. Note that this always happens no matter if the record is an email address or not. The email column in the example uses the email function.
The random function works on numeric columns and replaces the actual value with a value within a selected range. The Salary column in the example uses the random function. Note that this could serve as a very useful alternative to the T-SQL rand() function that does not work on data sets.
To continue the demonstration, the next step is to create a user that has no rights to unmask:
CREATE USER NoPrivUser WITHOUT LOGIN GRANT SELECT ON DDM TO NoPrivUser
Now if we issue the following select statement with the new users privileges:
EXECUTE AS USER = 'NoPrivUser' SELECT * FROM DDM REVERT
…you can see the result is masked like in the picture below:
I you have a user that you want to grant unmask permissions, the statement
GRANT UNMASK TO ExampleUser
is used. To revoke that permission the statement
REVOKE UNMASK TO ExampleUser
Updates on masked columns
Updates on masked columns works as usual. Note that if you have an application that for instance “saves” forms data back into the database, that behavior will replace the actual value with the masked value (unless the user have unmask permissions). So, a bit of a warning here.
What columns are masked?
The column is_masked in table sys.columns has the value 1 for masked columns. Alternatvely, the view sys.masked_columns shows only masked columns. It also shows the masking function in the column named masking_function.