Case:
The Data Conversion
transformation works fine for converting for example a non-unicode
string to an unicode string, but it has one annoying feature. It creates
an extra column (with the prefix "Copy of ") which can be confusing. Is there an alternative?
Solution:
There is, but only for database sources (not for flatfiles and suchlike). You can use the Cast or Convert statement in your source query for data conversions. It moves some of the 'workload' from SSIS to your database engine.
The difference between the Cast and Convert is that the CAST is an ANSI standard and the CONVERT is Microsoft specific and the CONVERT has an extra optional Style option. More information about that is available at MSDN.
Performance
My simple test case was processing a million records and convert a column via a CAST and via the standard SSIS component. And run those two different packages a thousand times.
After a thousand times the CAST had an average of 4 seconds and the Data Conversion transformation had an average of 5,3 seconds. A negligible result on simple test like this, that was not that thoroughly. But no annoying, unnecessary and confusing columns! :-)
What to do for the non-database sources?
Well, Todd McDermid developed a pattern to bypass the confusing extra columns. His workaround consists of using a Data Conversion transformation immediately followed by an Union All filtering the unnecessary columns and renaming the converted columns. And he even developed an open source replacement for the Data Conversion transformation if the CAST or CONVERT option is not possible for your source. But be warned. These solution will decrease the performance because they are asynchronous.
Solution:
There is, but only for database sources (not for flatfiles and suchlike). You can use the Cast or Convert statement in your source query for data conversions. It moves some of the 'workload' from SSIS to your database engine.
Data Conversion in source |
And of cource it also works in a Lookup |
1
2
3
| --CAST (ANSI standard) SELECT CAST (someColumn AS nvarchar(50)) AS someColumn FROM someTable |
1
2
3
| --CONVERT (Microsoft specific) SELECT CONVERT (nvarchar(50), someColumn) AS someColumn FROM someTable |
The difference between the Cast and Convert is that the CAST is an ANSI standard and the CONVERT is Microsoft specific and the CONVERT has an extra optional Style option. More information about that is available at MSDN.
Performance
My simple test case was processing a million records and convert a column via a CAST and via the standard SSIS component. And run those two different packages a thousand times.
Testing with a million records |
After a thousand times the CAST had an average of 4 seconds and the Data Conversion transformation had an average of 5,3 seconds. A negligible result on simple test like this, that was not that thoroughly. But no annoying, unnecessary and confusing columns! :-)
What to do for the non-database sources?
Well, Todd McDermid developed a pattern to bypass the confusing extra columns. His workaround consists of using a Data Conversion transformation immediately followed by an Union All filtering the unnecessary columns and renaming the converted columns. And he even developed an open source replacement for the Data Conversion transformation if the CAST or CONVERT option is not possible for your source. But be warned. These solution will decrease the performance because they are asynchronous.
No comments:
Post a Comment