Monday 29 April 2013

Using the SSIS Merge Join

The Merge Join Transform in SSIS is a great way to load a Data warehouse quickly and an easy way to join two data sources together. There are a few requirements to join these two data sources. The data sources must be sorted and there must be a key that you can join them with. This can all be done in the data flow of the SSIS package.
Here are some examples of a Merge Join with some different situations so you can see the outputs. The two images below show two tables. The first table contains the first names of people and the second table contains there last names. Each table contains an ID column that will be used in the join.
Table one:
image
Table two:
image

These are OLEDB sources. The queries in the sources are calling the data and sorting it with an order by clause. The source does not know the data is sorted so you have to manually tell the source it is sorted. You do this by right clicking on the source and selecting Show Advanced Editor.

image

Then click on the Input Output Properties Tab. Click on the OLE DB Source Output and Set the IsSorted Property to true.

image

Then click on the column that is the used to sort the data, in this case the ID column, then set the SortKeyPosition to 1. If you are sorting by more than one column, place a 2 on the next and so on.

image

Then you connect both sources to the Merge Join. When it asks which input you set the First name table as the left input. You can always swap the inputs later by clicking the swap input button in the Merge Join Transform.

image

In the Merge Join, Map the Id together and select the columns you want to pass through to the output. In this example the Join type is set to Left Outer Join. The Results will look like the below image.

image

Notice the Null in ID 5 under last name. There was no match for ID 5, but since it was set to Left Outer Join we still get to keep the ID 5. If you set the Merge Join to Inner Join the ID 5 row would be dropped.
One of the other scenarios you will encounter is when the right side of the join has more than one match for some of the ID columns. For example, let’s add another row to the Last name column.

image

This new table has an extra row with the ID 1 and the last name Davis 2. If you run this through the Merge Join with the Join set to Left Outer Join the results will match the image below.

image

If you change the Join to Inner Join the Results will be in the below image. Notice the ID 1 is still on the output twice and ID 5 is still in the output.

image

Now let’s add another row to the Last name table.
image

When you run this table through the Merge Join with the Join set to Full Outer Join the results will match the figure below:

image


Here you can see the Null Id. That is because we passed the ID through from the First name table. The Last name table ID was not passed through. If you want to ensure you get an Id then pass through each Id and use a derived column to check for nulls.
I hope this clarifies how to use the Merge Join Transform. This is a very powerful transform and can make joining data very fast.

references:http://mikedavissql.com/2011/09/06/using-the-ssis-merge-join/

No comments:

Post a Comment