Tuesday 2 July 2013

Incremental uploads using SSIS using lookup and spliting transformation.


Incremental uploads using SQL Server Integration services (SSIS)

Incremental Uploads Using SSIS

Incremental Uploads:

We have got a request to need a SSIS package to incremental uploads between two sql server instances which are two different locations.

Requirement:

There are two different databases which are available in two different instances.
SSIS package required to accomplish the below tasks.
  1. Sync data for the table Employee from Source to Destination
  2. Compare records based on a key value
  3. If you find any new records Insert them into destination
  4. Update all existing records from Source to destination  
  5. Insert records into destination If any new records found.

Environment:
Source: SQL_Instance: INHYDUDAYA; Database: Source
Destination: SQL_Instance: INHYDUDAYA\SQL2008R2; Database: Destination

Table information:

Connect to Source Instance:
USE Source
GO
CREATE TABLE Source_Employee (
ID INT IDENTITY NOT NULL PRIMARY KEY,
[First_Name] VARCHAR(50) NOT NULL,
[Last_Name] VARCHAR(50),
[SSL] VARCHAR(18),
[DLNO] VARCHAR(25),
[UpdatedOn] DATETIME NULL,
[CreatedOn] DATETIME NOT NULL DEFAULT(GETDATE())
)
GO
INSERT INTO Source_Employee([First_Name],[Last_Name],[SSL],[DLNO])
SELECT ‘Jason’,‘Mag’,‘SA-MYk9989001′,‘DL-SA0545678′
UNION
SELECT ‘Carry’,‘Uyon’,‘WC-KAP9989001′,‘DL-WC0545887′
UNION
SELECT ‘Chrish’,‘Lott’,‘AT-LKU8788954′,‘DL-AT059675′
UNION
SELECT ‘Kourav’,‘Mishra’,‘NY-NYU5669877′,‘DL-NY0073987′
GO
SELECT * FROM Source_Employee

Connect to Destination Instance:

USE Destination
GO
CREATE TABLE Dest_Employee (
ID INT NOT NULL PRIMARY KEY,
[First_Name] VARCHAR(50) NOT NULL,
[Last_Name] VARCHAR(50),
[SSL] VARCHAR(18),
[DLNO] VARCHAR(25),
[UpdatedOn] DATETIME NULL,
[CreatedOn] DATETIME NOT NULL DEFAULT(GETDATE())
)
GO
SELECT * FROM Dest_Employee


Now tables are ready, we need to start build SSIS package.

  1. Open SQL Server 2008 R2 business intelligence development studio
  2. Create a new SSIS package and name it as  “Incremental Uploads.dtsx” as below
  1. Add a new dataflow task and name it as “Data_Flow_Sync_SourceNDestination”
  1. Create two OLE DB connection managers to both source and destination. Since these are the test instances here I am using SA account at the connection managers.
  1. Open dataflow task. Add “OLE DB Source” and map it with source connection manager, select the table name “Source_Employee”.
           

  1. Now dataflow task looks like as below:
  1. Add a transformation “LookUp” and connect to Lookup from “OLEDB Source”
  2. Double click on “Lookup” component. Choose “Connection type” as “OLEDB Connection manager” and choose “Specify how to handle rows with no matching entries as “ignore failure”.

  1. Go to the next tab “Connection”. Select OLEDB connection manager as “Destination connection manager” and table as “Dest_Employee”.

       

  1. Go to the next tab “Columns”. Map columns on which lookup has to be perform. Means it applies where condition while performing sync operation. Here we have to map “ID” from “Available Input Columns” To “Available Lookup Columns”. And check all columns at “Available Lookup Columns”.

  1. For our clarification update “OutPut_Alias” column names as “Out_ID”, “Out_First_Name”, “Out_Last_Name”,  etc as below.

  1. Go to “Error Output” tab and choose “Ignore Failure” for the column “Error”

  1. Click on Ok. Now the dtaflow task looks like below.

  1. Add a new transformation “Conditional Split” and connect it from “Lookup” transformation.
  2. While connecting from “Lookup” choose “Lookup Match Output”

  1. Now the dataflow task looks like as below

  1.  
  1. Double click on Conditional Split transformation, give “Output_Name” as “New Rows” and assign the condition as ISNULL (Out_ID). It means there are no corresponding Out_ID available at destination which in turn as a new row.   
          

  1. Now add a condition to find the modified rows. Give Output Name as “Updated” . Compare source and destination columns with || (OR) operator. It filters the rows where any of these columns has been changed.  (([First_Name] != [Out_First_Name]) || ([Last_Name] != [Out_Last_Name]) || ([SSL] != [Out_SSL]) || ([DLNO] != [Out_DLNO]) || ([UpdatedOn] != [Out_UpdatedOn]) || ([CreatedOn] != [Out_CreatedOn]))
  2. Change Default Output name to “Unchanged Rows”. Means rows which are not fall in either of these two conditions are unchanged records.
  1. Now click on “Configure error Output” and made “Ignore failure” when error occurred for the output “Updated”. Because if both source and destination tables are in sync and no new records available and no updates required then the package should execute without fail and ofcourse it neither insert not update any commands.
          
          

  1. Click on OK and now the data flow task looks like below.
  1. Add a “OLE DB Destination” and connect it from conditional split transformation. While connecting choose “New Rows” as Output.
  1. Click on OK. Now open “OleDB destination” and map it with destination connection manager. Select the table name “Dest_Employee”.
  1. Go to the tab mappings and map columns accordingly as shown in below figure.
         

  1. Click on OK now the data flow looks like below:
  1. Add a transformation “OLE DB command” to data flow and name it as “OLE DB Command_Update Changed Rows”.
  2. Connect “OLE DB Command” from conditional split transformation. While connecting choose “New Rows” as Output.
  1. Click on OK. Now open the OLE DB command and select destination connection manager.
  2. Go to the component properties tab and give SQL command as below.
UPDATE dbo.Dest_Employee
SET
First_Name = ?
,Last_Name = ?
,SSL = ?
,DLNO = ?
,UpdatedOn = ?
,CreatedOn = ?
WHERE ID = ?

           

  1. Go to the next tab called “Column mappings” and map columns according to the parameters given in update statement. If you see the update statement “ID” would be last parameter hence we have to map it with last “Available Destination Column” as below.
  1. Click on OK. Now the package is ready and the data flow task looks like below.
         

  1. Now execute the package and remember there are no rows available at destination, hence initially it inserts all rows from source to destination.
  1. Now check the destination table
  1. You can see that all rows have been loaded into destination from source.
  1. Now insert 3 rows and update two existing rows at source and then run the package again.
USE Source
GO
INSERT INTO Source_Employee (First_Name,Last_Name,SSL,DLNO)
SELECT ‘Chan’,‘Yano’,‘CH-PP89977345′,‘DL-CH0587332′
UNION
SELECT ‘Krishnan’,‘Gopal’,‘ID-IN8854687′,‘DL-IN994532′
UNION
SELECT ‘Krish’,‘Manlon’,‘KD-KP8814356′,‘DL-ASJ9KI0112′
USE Source
GO
UPDATE Source_Employee
SET First_Name = First_Name+‘ Updated’,
Last_Name = Last_Name+‘ Updated’,
UpdatedOn = GETDATE()
WHERE ID IN (2,4)
USE Source
GO
SELECT * FROM Source_Employee
  1. After inserting new records and updating two rows the source table looks like :
 
  1. Now execute the package:
  1. Now you can see that there are 3 new rows inserted and 2 rows updated.
  2. Go to the destination table and check the table to make sure both are in sync.

No comments:

Post a Comment