Monday, 29 April 2013

Dynamic SSIS Package to Import Excel files into SQL Server Database

Problem/ScenarioNeed to import Excel files to a SQL table. Everyday one file is created at specified location in the source system. We need to copy that file from Source System to Local system and then load to SQL Table.

Conditions:
1. Each file should be loaded only once. Everynight Job should be executed to load data into reporting Data mart.
2. Source system will maintain all the history files so files at souce should not be deleted.
3. If Job failed due to some reason (schema changes, server down, connection issues etc.), it should load all the files from last run date to current date in next successul run. For example, job didn't run last one week then whenever job runs next time successfully, it should load current file as well as all the files of last week which were missing.
4. All the source files will have the same structure (Schema)
5. Nomenclature - Each file will have name Transaction followed by current date in YYYY-MM-DD format. For example, if a file was created on 01-Aug-2010 then its name should be Transaction2010-08-01.

Solution
I will take advantage of the ForEach Loop Container. I'll create a new SSIS package to solve the above problem.

STEP 1:
Create following tables in your local database (destination database).
CREATE TABLE [TransactionHistory](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [nchar](1) NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
CONSTRAINT [PK_TransactionHistory_TransactionID]
PRIMARY KEY CLUSTERED ([TransactionID] ASC)
) ON [PRIMARY]
GO

CREATE TABLE [ExcelFilesLog](
[FileName] [varchar](100),
[FileCreatedDate] [datetime],
[FileLoadDate] [datetime],
[RecordCount] [int],
) ON [PRIMARY]
GO

STEP 2: Create a new SSIS package and rename it to LoadDynamicExcelFiles.dtsx

STEP 3: Add following package variables:

VariableName Description Examle
LastRunDate To store last run date 2010-07-30
CurrentDate To hold running date 2010-08-10
Directory
Source
To store source
directory path
D:\SSIS\Hari\SourceFiles
Directory
Destination
To store local
directory path
D:\SSIS\Hari\DestinationFiles
CurrentFile
Name
To store current
file name
D:\SSIS\Hari\SourceFiles\
Transaction2010-08-01


Create one OLE DB connection (I will use (local).TestHN connection manager) for local database where you want to load excel files data. Create one Excel Connection Manager for excel files located in D:\SSIS\Hari\DestinationFiles. At least one file should be there to create excel connection manager.


Click on Excel Connection Manager --> go to Properties window --> Select Expression and set ExcelFilePath with package variable User::CurrentFileName as shown below:




For this article, I'll use two directories - one for sources files and one for destination files. Location of these files are given below:
Source Files: D:\SSIS\Hari\SourceFiles
Destination Files: D:\SSIS\Hari\DestinationFiles


I have created few excel files from Production.TransactionHistory table of AdventureWorks2008R2 database as shown below:



I have created files from 2010-07-20 to 2010-08-01. After executing the package first time, I will create files from 2010-08-02 to 2010-08-09 to test the package.




STEP 4:

Drag and drop File System Task and double click to open File System Task Editor. Enter FST - Delete destination directory content in name and select Delete directory content as Operation. Set IsSourcePathVariable to True and select SourceVariable as User::DirectoryDestination. Finally click on OK and save changes.




Drag and drop Execute SQL Task and double click to open Execute SQL Task Editor. Enter Execute SQL Task - Get LastRunDate in Name, select Single row as Result Set, Conection Type as OLE DB and Connection as (local).TestHN and SQLSourceType as Direct input. Enter below query in SQLStatement:

SELECT ISNULL(MAX([FileCreatedDate]),'2010-01-01') AS LastRunDate
FROM [dbo].[ExcelFilesLog] (NOLOCK)

In Result Set tab, set Result Name 0 to variable User::LastRunDate.
Finally click OK and save changes.



Drag and drop Script Task and double click to open Script Task Editor. Select User::DirectoryDestination, User::DirectorySource, User::LastRunDate in ReadOnlyVariables. Click on Edit Script... and paste below code:

using System.IO;

public void Main()
{
try
{
string DirectorySource = Dts.Variables["User::DirectorySource"].Value.ToString();
string DirectoryDestination = Dts.Variables["User::DirectoryDestination"].Value.ToString();
DateTime LastRunDate = (DateTime)Dts.Variables["User::LastRunDate"].Value;


string fileName,fileExtension;
string[] files = Directory.GetFiles(DirectorySource);
//Copy source files to destination
foreach (string f in files)
{
fileName = Path.GetFileName(f);
fileExtension = Path.GetExtension(f);
DateTime CurrentDate = DateTime.Parse(fileName.Substring("Transaction".Length, 10));


if ((DateTime.Compare(CurrentDate,LastRunDate)>0) && (fileExtension == ".xlsx"))
{
//MessageBox.Show(fileName.ToString());
File.Copy(f, Path.Combine(DirectoryDestination, fileName), true);
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Log(ex.Message, 0, null);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}



Drag and drop Foreach Loop container. Select Foreach File Enumerator as Enumerator, enter D:\SSIS\Hari\DestinationFiles in Folder path and *.xlsx in files textbox. Select Fully qualified as Retrieve file name. In Variable Mappings, Select User::CurrentFileName for Index 0 to store current file name for each iteration.




Now drag and drop Data Flow Task inside Foreach loop container. Use Excel Source reader to read excel files from destination directory. Use Excel Connection Manager as connection manager for excel files. In connection properties, Select @[User::CurrentFileName] as ExcelFilePath. Use Data Conversion, if required. Use Row Count Task to count number of rows in data flow and store it in User::RecordCount variable. Use OLE DB Destination to load data into SQL table.




Drag and drop Execute SQL Task inside Foreach loop container to log information about current file.

Double click on Execute SQL Task to open Execute SQL Task Editor, enter Execute SQL Task - Insert info into Log table as Name, None as Result Set, OLE DB as connection type, (local).TestHN as connection, Direct input as SQLSourceType and below query as SQLStatement.



DECLARE

@FileName varchar(500)

,@FilePath varchar(500)

,@DestinationPath varchar(500)

SET @DestinationPath = ?
SET @FilePath = ?
SET @FileName = REPLACE(@FilePath,@DestinationPath + '\','')

INSERT INTO [ExcelFilesLog]
(
[FileName]
,[FileCreatedDate]
,[FileLoadDate]
,[RecordCount]
)
SELECT
@FileName [FileName]
,CAST(SUBSTRING(@FileName,12,10) as datetime) [FileCreatedDate]
,GETDATE() [FileLoadDate]
,?


In Parameter mapping, map User::DirectoryDestination, User::CurrentFileName, and User::RecordCount with parameter 0,1,2 respectively as shown below.


Finally click OK and save changes.

Below is the final layout of our package:




STEP 6:
We are done with the package development. To execute the package, go to SSIS package location, (in this example, it is D:\SSIS\Hari\Sample\SSIS-Sample1), right click on LoadDynamicExcelFiles.dtsx --> Open with --> SQL Server 2008 Integration Services Package Execution Utility. This will open Execute Package Utility. Click on Execute button to run ssis package. Now you can see the progress of package execution in Package Execution Progress window.




Now you can check ExcelFileLog table to cross check the result of package.



Now I'll add few more excel files in source location - from

Transaction2010-08-02 to Transaction2010-08-09 as shown below:


When I execute this package next time, it will load only new files. Yon can check ExcelFileLog for each iteration:

No comments:

Post a Comment