Case
I have a column with multiple values and I want to split them into multiple records.
Solution
You could solve it with a fancy TSQL query. I saw split functions with common table expressions, but a relatively easy script could to the trick in SSIS as well. (don't hesitate to post your query/solution in the comments)
1) Source
Add your source to the Data Flow Task
2) Script Component - input columns
This solution uses an asynchronous Script Component, so add a Script Component (type transformation) to your Data Flow Task. Edit it, go to the Input Columns pane and select all the columns you need downstream the data flow as readonly. In this case we need the columns Teacher and Students.
3) Script Component - output port
Go to the Inputs and Outputs pane and click on Output 0. Set the SynchronousInputID property to none to make this Script Component asynchronous.
4) Script Component - output columns
Add output columns for each input column that you need downstream the data flow. In this case we need Teacher (same datatype and size) and a new column named Student which will contain one value from the input column Students (same datatype, but size could probably be smaller).
5) The script
Copy the Inputs0 _ProcessInputRow method to your script and remove all the other methods (PreExecute, PostExecute and CreateNewOutputRows) because we don't need them.
or VB.Net
6) The result
For testing purposes I added a derived column and a couple of data viewer
Solution
You could solve it with a fancy TSQL query. I saw split functions with common table expressions, but a relatively easy script could to the trick in SSIS as well. (don't hesitate to post your query/solution in the comments)
1) Source
Add your source to the Data Flow Task
2) Script Component - input columns
This solution uses an asynchronous Script Component, so add a Script Component (type transformation) to your Data Flow Task. Edit it, go to the Input Columns pane and select all the columns you need downstream the data flow as readonly. In this case we need the columns Teacher and Students.
Input columns (readonly) |
3) Script Component - output port
Go to the Inputs and Outputs pane and click on Output 0. Set the SynchronousInputID property to none to make this Script Component asynchronous.
asynchronous |
4) Script Component - output columns
Add output columns for each input column that you need downstream the data flow. In this case we need Teacher (same datatype and size) and a new column named Student which will contain one value from the input column Students (same datatype, but size could probably be smaller).
Output columns |
5) The script
Copy the Inputs0 _ProcessInputRow method to your script and remove all the other methods (PreExecute, PostExecute and CreateNewOutputRows) because we don't need them.
// C# Code using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { // Method that will execute for each row passing public override void Input0_ProcessInputRow(Input0Buffer Row) { // First we are converting the comma seperated list into a string array. // You can change the comma if you are using an other seperator like | or ; string [] Students = Row.Students.ToString().Split( new char [] { ',' }, StringSplitOptions.None); // Counter var used the loop through the string array int i = 0; // Looping through string array with student names while (i < Students.Length) { // Start a new row in the output Output0Buffer.AddRow(); // Pass through all columns that you need downstream the data flow Output0Buffer.Teacher = Row.Teacher; // This is the splitted column. Take the [n] element from the array // and put it in the new column. Output0Buffer.Student = Students[i]; // Increase counter to go the next value i++; } } } |
or VB.Net
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper <Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _ <CLSCompliant( False )> _ Public Class ScriptMain Inherits UserComponent ' Method that will execute for each row passing Public Overrides Sub Input0_ProcessInputRow(Row As Input0Buffer) ' First we are converting the comma seperated list into a string array. ' You can change the comma if you are using an other seperator like | or ; Dim Students As String () = Row.Students.ToString().Split( New Char () { "," c}, StringSplitOptions.None) ' Counter var used the loop through the string array Dim i As Integer = 0 ' Looping through string array with student names While i < Students.Length ' Start a new row in the output Output0Buffer.AddRow() ' Pass through all columns that you need downstream the data flow Output0Buffer.Teacher = Row.Teacher ' This is the splitted column. Take the [n] element from the array ' and put it in the new column. Output0Buffer.Student = Students(i) ' Increase counter to go the next value i += 1 End While End Sub End Class |
6) The result
For testing purposes I added a derived column and a couple of data viewer
The result |
No comments:
Post a Comment