Sorted data flows in SSIS

SSIS Sort

Sometimes a data flow in SSIS needs to be sorted. It may be because the business rules require sorted data, or it may be that you want to use either a Merge or Merge join component. Those component won’t work if the data isn’t sorted previously.

So, how does SSIS know if the data is sorted?

If we add a sort component in a data flow, SSIS will set the property IsSorted to true:

SSIS Sort Operator IsSorted property

SSIS Sort Operator IsSorted property

And as I’ll show later this property can be set manually.

If you add a Merge or Merge join component SSIS checks if IsSorted is true. If it is false, you’ll get an error message:

SSIS Merge Data Not Sorted

SSIS Merge Data Not Sorted

Validation error. Data Flow Task: Data Flow Task: “Merge.Inputs[Merge Input 1]” must be sorted. If possible, sort the data at the source and mark the data as sorted by setting the IsSorted and SortKeyPosition properties in the Advanced Editor. Otherwise, add a Sort Transformation to the path before the merge.

However, a sort can be a very costly operation. If you know that the data is sorted already, the sort operation can be avoided. If you are using a database as a data source it is sometimes a better idea to use an ORDER BY statement. There may also be other data sources that you know are sorted already, for instance raw files or text files. But then SSIS doesn’t know the data is sorted. So to tell SSIS that the data is sorted, you need to use the Advanced Editor on the component. On the “Input and Output Properties” page, select the “Output” node and switch the IsSorted property to True:

SSIS IsSorted Property

SSIS IsSorted Property

Next, set the SortKeyPosition property on each column that is part of the sort:

SSIS SortKeyPosition Property

SSIS SortKeyPosition Property

A positive value for the SortKeyPosition property means an ascending sort value, and a negative value indicates a descending value.

What we have done now is to give SSIS a hint that the data is sorted. Setting this property doesn’t mean that SSIS sorts actually the data. So if the data is marked as sorted but isn’t, the results will become very unpredictable.

1 comment » Write a comment

Leave a Reply

Required fields are marked *.