Monday, 6 May 2013

Fun with SSIS - Part 1 - Troubleshooting (Dirty Read)

Fun with SSIS - Part 1 - Troubleshooting

Although I still consider myself a DBA, the last few years have seen me spend more and more time in the Business Intelligence development space (SSIS development in particular). In this the first of a 2-part SSIS series, I’ll be focusing on a number of SSIS “issues” I’ve discovered the hard way.

1: Case Sensitivity
I’ve never been particularly good at using case consistently, the perfect example of which was my publisher asking me to edit hundreds of figure captions in my recently published book. Like most people of my age who took a college course in programming, I studied C, and spent countless hours debugging problems traced back to the incorrect case of variable names (and using “=” instead of “==” !).
When I decided to specialize in SQL Server administration, I rejoiced at the lack of case sensitivity involved, although I occasionally see databases with the case sensitive option, in most cases (pardon the pun) installed my accident.
Imagine my angst when I discovered that a tool I’m spending more and more time with is rampantly case sensitive. Almost everything about SSIS is case sensitive; the expression language, variable names and one that gets me every time; lookups.

A very common SSIS data flow transformation is lookup. As the name implies, it’s most often used to lookup a key value based on a text value in the data flow. For example, in the figure below, we’re returning the supplierKey based on the supplierName value. The first screen defines the lookup source, and the second defines the join condition and what value is to be returned.

1


This, for all intents and purposes, in a simple join condition with a column from the joined table being returned. However, unlike T-SQL, the SSIS lookup is case sensitive.

The obvious way to address this is to ensure both the lookup column (vendorName in this example) and the matching data flow field (Supplier) are set to the same case using the UPPER (or LOWER) function. For the lookup, that’s a simple case of changing the lookup source T-SQL to use one of those functions. For the data flow field, we could use the derived column transformation using the same function, and use that column in the lookup process.

2: Strong Typing

If there was any remaining doubt that SSIS was designed for developers (and not classic DBAs) it’s confirmed in its (very) strong typing. As an example of that, let’s revisit the lookup example from above. A previous version of the database had the Supplier data type set to varchar(50) and vendorName set to nvarchar(255). When we try and connect these in the lookup, we’ll get the following error;

2
 
Another common issue with mismatched data types is when inserting rows into a table at the end of a data flow task. If the destination table’s column length is shorter than the source, you’ll get a warning such as this one;
3

Unless you have the luxury of an enterprise wide data dictionary (that every one adheres to), data type variances such as these are common, with the most frequent SSIS workarounds being to either CAST the columns in the data source selection (e.g.; using a view) and/or using type casts in a derived column transformation (more on that in part 2 of this blog series)

3: 64-bit Trickery


There will presumably come a day when everything works on 64-bit, and that day can’t come soon enough. In the meantime, we have to deal with a variety of annoying problems, one of which is 64-bit driver support in SSIS.

A common data source for SSIS packages are Excel files. The current (and all previous) versions of Excel do not include 64-bit drivers. This presents a problem when SSIS packages run in 64-bit environments, and the error messages returned are far from helpful (unless you consider “The AcquireConnection method call to the connection manager blah failed with error code blah” helpful).

Fortunately, there’s an easy workaround for this issue, via a property called Run64BitRuntime accessed through the Debugging page of the project’s properties window. As per the figure below, setting this property to False (it’s true by default) will use the 32-bit driver.

4


Note that there are a whole range of other considerations for 64-bit mode, for example when calling child packages from a parent with a different setting for this value. For a great blog post on these issues visit Todd McDermid’s post  from last year.

4: Package Configurations


One of the real SSIS horror stories that I was very close to (but not responsible for!) was an SSIS package that was run against a production database by accident (blowing away gigabytes of production data). The package used a production configuration file instead of a test/development file. There’s a couple of aspects to this (all too common) problem; configuration technique and security.
Firstly, configuration technique. SSIS allows a number of ways of configuring settings such as server and database name. The most common one is to use an XML configuration file, the path to which is either stored within the package itself, or set through an environment variable.
I’ve seen problems with both of these techniques. Using a configuration file location (as per the example below) assumes that all servers on which the package runs has the same path available, which is quite often not the case.

5


The environment variable option can also present some problems, notably the need to reboot the server for the variable to come into effect. Depending on the environment, rebooting production servers is not the easiest thing to do.
Another alternative to both of the above is one that I discovered while listening to Greg Low’s SQLDownUnder podcast with Jamie Thompson  in which Greg spoke of the idea of connecting to a special configuration database which stored the configuration settings. The settings returned were based on the calling machine’s context and then used to set the value of package variables for subsequent use.
Regardless of the configuration technique, one of the really important considerations is security. In each of the above techniques, mistakes can still be made. For example, a production configuration file can be copied over the top of a development file, or the incorrect settings stored in a configuration database.
To prevent these types of configuration errors from killing a production database, we need to ensure that the security context of an SSIS package prevents it from accessing the wrong environment. There are at least two ways of achieving this; using separate service accounts in each environment (and making sure the account only has database permissions in the appropriate environment), or better still, having the production database in a separate domain with trust permissions removed to the other domain(s).
With the appropriate security setup, even if the wrong configuration file is used, the package will fail to run due to the lack of database permissions.
5: Miscellaneous Silly Business

Finally, there’s a number of really silly things I continue to do that I shouldn’t (I’m a slow learner). Firstly, the evaluate as expression property.
A common design pattern in SSIS packages is to use a variable as the source for an Execute SQL task. The variable’s value is then set using an expression which references another variable. A common example of this is a variable that contains an update command with a where clause containing the value of a BatchID variable. The expression property contains the code that references the other variable.
If you use this type of arrangement, make sure the variable's evaluate as expression property is set to TRUE, as per the example below. I’ve spent many hours debugging problems where I’ve simply forgotten to set this properly.

6

To wrap up this post, I frequently want to add an existing .dtsx package (e.g.; from another project) to the current project. By default I right click the SSIS Packages node in solution explorer and choose the .dtsx package that I’ve already copied into the project directory. I’m then confused as to why the package is renamed e.g.; from Package.dtsx to Package(1).dtsx.
It turns out that the Add Existing Package menu is really designed for adding a package from a different location (i.e.; remote server), and if you use it locally, it makes a copy (and renames) the file.
For the situation I described, using the Add > Existing Item option (instead of Add Existing Package from the SSIS Packages menu) is the correct option. The figure below compares these 2 options side by side.
7
In the next post, I’ll talk about some important SSIS performance tuning tips.

Cheers

No comments:

Post a Comment