Problem/ScenarioNeed to
import Excel files to a SQL table. Everyday one file is created at specified
location in the source system. We need to copy that file from Source System to
Local system and then load to SQL Table.
Conditions:
1. Each file should be loaded only once. Everynight Job should be executed to load data into reporting Data mart.
2. Source system will maintain all the history files so files at souce should not be deleted.
3. If Job failed due to some reason (schema changes, server down, connection issues etc.), it should load all the files from last run date to current date in next successul run. For example, job didn't run last one week then whenever job runs next time successfully, it should load current file as well as all the files of last week which were missing.
4. All the source files will have the same structure (Schema)
5. Nomenclature - Each file will have name Transaction followed by current date in YYYY-MM-DD format. For example, if a file was created on 01-Aug-2010 then its name should be Transaction2010-08-01.
Solution
I will take advantage of the ForEach Loop Container. I'll create a new SSIS package to solve the above problem.
STEP 1:
Create following tables in your local database (destination database).
CREATE TABLE [TransactionHistory](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [nchar](1) NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
CONSTRAINT [PK_TransactionHistory_TransactionID]
PRIMARY KEY CLUSTERED ([TransactionID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [ExcelFilesLog](
[FileName] [varchar](100),
[FileCreatedDate] [datetime],
[FileLoadDate] [datetime],
[RecordCount] [int],
) ON [PRIMARY]
GO
STEP 2: Create a new SSIS package and rename it to LoadDynamicExcelFiles.dtsx
STEP 3: Add following package variables:
Destination Files: D:\SSIS\Hari\DestinationFiles
In Result Set tab, set Result Name 0 to variable User::LastRunDate.
Finally click OK and save changes.
Drag and drop Script Task and double click to open Script Task Editor. Select User::DirectoryDestination, User::DirectorySource, User::LastRunDate in ReadOnlyVariables. Click on Edit Script... and paste below code:
using System.IO;
public void Main()
{
try
{
string DirectorySource = Dts.Variables["User::DirectorySource"].Value.ToString();
string DirectoryDestination = Dts.Variables["User::DirectoryDestination"].Value.ToString();
DateTime LastRunDate = (DateTime)Dts.Variables["User::LastRunDate"].Value;
string fileName,fileExtension;
string[] files = Directory.GetFiles(DirectorySource);
//Copy source files to destination
foreach (string f in files)
{
fileName = Path.GetFileName(f);
fileExtension = Path.GetExtension(f);
DateTime CurrentDate = DateTime.Parse(fileName.Substring("Transaction".Length, 10));
if ((DateTime.Compare(CurrentDate,LastRunDate)>0) && (fileExtension == ".xlsx"))
{
//MessageBox.Show(fileName.ToString());
File.Copy(f, Path.Combine(DirectoryDestination, fileName), true);
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Log(ex.Message, 0, null);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}
SET @DestinationPath = ?
SET @FilePath = ?
SET @FileName = REPLACE(@FilePath,@DestinationPath + '\','')
INSERT INTO [ExcelFilesLog]
(
[FileName]
,[FileCreatedDate]
,[FileLoadDate]
,[RecordCount]
)
SELECT
@FileName [FileName]
,CAST(SUBSTRING(@FileName,12,10) as datetime) [FileCreatedDate]
,GETDATE() [FileLoadDate]
,?
Below is the final layout of our package:
Conditions:
1. Each file should be loaded only once. Everynight Job should be executed to load data into reporting Data mart.
2. Source system will maintain all the history files so files at souce should not be deleted.
3. If Job failed due to some reason (schema changes, server down, connection issues etc.), it should load all the files from last run date to current date in next successul run. For example, job didn't run last one week then whenever job runs next time successfully, it should load current file as well as all the files of last week which were missing.
4. All the source files will have the same structure (Schema)
5. Nomenclature - Each file will have name Transaction followed by current date in YYYY-MM-DD format. For example, if a file was created on 01-Aug-2010 then its name should be Transaction2010-08-01.
Solution
I will take advantage of the ForEach Loop Container. I'll create a new SSIS package to solve the above problem.
STEP 1:
Create following tables in your local database (destination database).
CREATE TABLE [TransactionHistory](
[TransactionID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[ReferenceOrderID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [nchar](1) NOT NULL,
[Quantity] [int] NOT NULL,
[ActualCost] [money] NOT NULL,
CONSTRAINT [PK_TransactionHistory_TransactionID]
PRIMARY KEY CLUSTERED ([TransactionID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [ExcelFilesLog](
[FileName] [varchar](100),
[FileCreatedDate] [datetime],
[FileLoadDate] [datetime],
[RecordCount] [int],
) ON [PRIMARY]
GO
STEP 2: Create a new SSIS package and rename it to LoadDynamicExcelFiles.dtsx
STEP 3: Add following package variables:
VariableName | Description | Examle |
LastRunDate | To store last run date | 2010-07-30 |
CurrentDate | To hold running date | 2010-08-10 |
Directory Source |
To store source directory path |
D:\SSIS\Hari\SourceFiles |
Directory Destination |
To store local directory path |
D:\SSIS\Hari\DestinationFiles |
CurrentFile Name |
To store current file name |
D:\SSIS\Hari\SourceFiles\ Transaction2010-08-01 |
Create one
OLE DB connection (I will use (local).TestHN
connection manager) for local database where you want to load excel files data.
Create one Excel Connection Manager for excel files located in
D:\SSIS\Hari\DestinationFiles. At least one file should be
there to create excel connection manager.
Click on Excel
Connection Manager --> go to Properties window --> Select Expression and
set ExcelFilePath with package variable
User::CurrentFileName as shown below:
For this article, I'll use two directories - one
for sources files and one for destination files. Location of these
files are given below:
Source Files:
D:\SSIS\Hari\SourceFilesDestination Files: D:\SSIS\Hari\DestinationFiles
I have created few excel files from
Production.TransactionHistory table of
AdventureWorks2008R2 database as shown below:
I have created files
from 2010-07-20 to 2010-08-01. After executing the package first time, I will
create files from 2010-08-02 to 2010-08-09 to test the package.
STEP 4:
Drag and drop File System Task
and double click to open File System Task Editor. Enter FST -
Delete destination directory content in name and select Delete
directory content as Operation. Set IsSourcePathVariable to
True and select SourceVariable as
User::DirectoryDestination. Finally click on OK and save
changes.
Drag and drop Execute SQL Task
and double click to open Execute SQL Task Editor. Enter Execute SQL
Task - Get LastRunDate in Name, select Single row as Result Set, Conection Type
as OLE DB and Connection as (local).TestHN and SQLSourceType as Direct input.
Enter below query in SQLStatement:
SELECT ISNULL(MAX([FileCreatedDate]),'2010-01-01') AS
LastRunDate
FROM [dbo].[ExcelFilesLog] (NOLOCK)
FROM [dbo].[ExcelFilesLog] (NOLOCK)
In Result Set tab, set Result Name 0 to variable User::LastRunDate.
Finally click OK and save changes.
Drag and drop Script Task and double click to open Script Task Editor. Select User::DirectoryDestination, User::DirectorySource, User::LastRunDate in ReadOnlyVariables. Click on Edit Script... and paste below code:
using System.IO;
public void Main()
{
try
{
string DirectorySource = Dts.Variables["User::DirectorySource"].Value.ToString();
string DirectoryDestination = Dts.Variables["User::DirectoryDestination"].Value.ToString();
DateTime LastRunDate = (DateTime)Dts.Variables["User::LastRunDate"].Value;
string fileName,fileExtension;
string[] files = Directory.GetFiles(DirectorySource);
//Copy source files to destination
foreach (string f in files)
{
fileName = Path.GetFileName(f);
fileExtension = Path.GetExtension(f);
DateTime CurrentDate = DateTime.Parse(fileName.Substring("Transaction".Length, 10));
if ((DateTime.Compare(CurrentDate,LastRunDate)>0) && (fileExtension == ".xlsx"))
{
//MessageBox.Show(fileName.ToString());
File.Copy(f, Path.Combine(DirectoryDestination, fileName), true);
}
}
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Log(ex.Message, 0, null);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}
Drag and drop Foreach Loop container. Select
Foreach File Enumerator as Enumerator, enter
D:\SSIS\Hari\DestinationFiles in Folder path and
*.xlsx in files textbox. Select Fully
qualified as Retrieve file name. In Variable Mappings, Select
User::CurrentFileName for Index 0 to store current file name
for each iteration.
Now drag and drop Data Flow Task inside Foreach
loop container. Use Excel Source reader to read excel files
from destination directory. Use Excel Connection Manager as
connection manager for excel files. In connection properties, Select
@[User::CurrentFileName] as ExcelFilePath. Use Data
Conversion, if required. Use Row Count Task to count
number of rows in data flow and store it in User::RecordCount
variable. Use OLE DB Destination to load data into SQL
table.
Drag and drop Execute SQL Task
inside Foreach loop container to log information about current
file.
Double click on Execute SQL Task to open Execute
SQL Task Editor, enter Execute SQL Task - Insert info into Log
table as Name, None as Result Set, OLE
DB as connection type, (local).TestHN as connection,
Direct input as SQLSourceType and below query as
SQLStatement.
DECLARE
@FileName varchar(500)
,@FilePath varchar(500)
,@DestinationPath varchar(500)
SET @DestinationPath = ?
SET @FilePath = ?
SET @FileName = REPLACE(@FilePath,@DestinationPath + '\','')
INSERT INTO [ExcelFilesLog]
(
[FileName]
,[FileCreatedDate]
,[FileLoadDate]
,[RecordCount]
)
SELECT
@FileName [FileName]
,CAST(SUBSTRING(@FileName,12,10) as datetime) [FileCreatedDate]
,GETDATE() [FileLoadDate]
,?
In Parameter mapping, map
User::DirectoryDestination, User::CurrentFileName, and User::RecordCount with
parameter 0,1,2 respectively as shown below.
Finally click OK and save
changes.
Below is the final layout of our package:
STEP 6:
We are done with the package development. To execute the package, go to SSIS package location, (in this example, it is D:\SSIS\Hari\Sample\SSIS-Sample1), right click on LoadDynamicExcelFiles.dtsx --> Open with --> SQL Server 2008 Integration Services Package Execution Utility. This will open Execute Package Utility. Click on Execute button to run ssis package. Now you can see the progress of package execution in Package Execution Progress window.
We are done with the package development. To execute the package, go to SSIS package location, (in this example, it is D:\SSIS\Hari\Sample\SSIS-Sample1), right click on LoadDynamicExcelFiles.dtsx --> Open with --> SQL Server 2008 Integration Services Package Execution Utility. This will open Execute Package Utility. Click on Execute button to run ssis package. Now you can see the progress of package execution in Package Execution Progress window.
Now you can check
ExcelFileLog table to cross check the result of package.
Now I'll add few more excel files in source
location - from
Transaction2010-08-02 to Transaction2010-08-09 as
shown below:
When I execute this package next time, it will
load only new files. Yon can check ExcelFileLog for each iteration:
No comments:
Post a Comment