Before a fact table is loaded, a common pattern is to do a lookup in SSIS before the loading. The lookup can be some key from a dimension for instance, or maybe a repository of primary keys in a MDM solution. But what if the lookup doesn’t return a matching value? Sometimes when this happens, we want to insert new values to the lookup table (to get the key) before the loading of the fact table starts. Here I’ll show one way to handle this scenario.
A common pattern for handling late arriving lookups is to replace the lookup value with a marker, for instance -1, and simply continue the loading of the fact table. After the loading finishes, a special routine picks up unmatched rows in the fact table and replace the markers with new and correct keys from the lookup table.
Another approach is to use an OLE DB Command in the data flow to add the rows to the dimension. This is the method I’ll describe in this blog post. In this simple demonstration, I have a source table with customers. I also have a lookup table that contains the primary key that I want to use in the data flow as key in the destination table. Lastly, there is a destination table. (Imagine there are lots of data sources in this example, otherwise it doesn’t make sense of course.)
In the source, there will be two customers. One with the name “Tomas Lind”, and another customer with the name “A new customer!”. The customer names are used for the lookup. In the lookup table, Tomas Lind will already exist so there will be a lookup match. The customer “A new customer!” will not exist in the lookup table. Finally, the destination table will be empty. But since this solution is made to handle late arriving lookups, when the data flow finishes, both customers will be in the destination table.
To create the demo database, run the following T-SQL script:
CREATE DATABASE LAL GO USE LAL GO CREATE TABLE dbo.SourceFact ( CustomerName VARCHAR(100) ) GO INSERT dbo.SourceFact (CustomerName) VALUES ('Tomas Lind'), ('A new customer!') GO CREATE TABLE dbo.CustomerLookup ( CustomerId INT IDENTITY(1,1) NOT NULL CONSTRAINT CustomerLookup_CustomerId_PK PRIMARY KEY CLUSTERED ,CustomerName VARCHAR(100) ) GO INSERT dbo.CustomerLookup (CustomerName) VALUES ('Tomas Lind') GO CREATE TABLE dbo.DestinationFact ( CustomerId INT NOT NULL CONSTRAINT DestinationFact_CustomerId_PK PRIMARY KEY CLUSTERED ,CustomerName VARCHAR(100) ) GO
The first component we need in SSIS is a dataflow. The starting point in the dataflow is simply to read the content in the source table. The next step is to add the CustomerId (primary key) column to the data flow with the use of a Derived Column:
The CustomerId could have been added in the source instead, but the point is that it initially is set to 0. The next step is to add the lookup component:
As you can see in the screen shots above, the settings are: not to use cache (of no use here), to redirect unmatching entries, and to replace the CustomerId with the key found in the lookup table. Also note that the customer name is used as the lookup value and the CustomerId is the result from the lookup.
The lookup component will now return two data flows. One with matching/existing customers, and another with unmatched/new customers. For the unmatching rows we will use an OLE DB Command to insert the new customers, and for the already existing customers, we don’t need to do anything yet. In the OLE DB Command we can use a T-SQL statement, but in this demonstration I will use a stored procedure that adds new rows to the CustomerLookup table:
CREATE PROCEDURE dbo.AddCustomerLookup ( @CustomerName VARCHAR(100) ,@CustomerId INT OUTPUT ) AS SET NOCOUNT ON INSERT dbo.CustomerLookup (CustomerName) VALUES (@CustomerName) SELECT @CustomerId = SCOPE_IDENTITY() GO
On the Component Properties page in the OLE DB Command, the stored procedure is called and the CustomerId is returned as an OUTPUT parameter:
Now we have CustomerId’s for both the old and the new customer. Both these data flows needs to be merged with a Union all component before the insert to the destination:
That’s it! If we run the package there will be one old customer in the “left” data flow, and one new customer in the other data flow:
The destination table is now populated with both rows: