Saturday 27 July 2013

3 Options to insert data into the SQL Server database table & remove unwanted Quotation Marks while importing a data file

How to remove unwanted Quotation Marks while importing a data file


This post describes how to strip off unwanted quotation marks around the character strings in the data file.
Let’s assume that you have a text file which has the following data and you want to import the data into a SQL Server database table. 

ABC”,”DEF”,”Adventure Works
PQR”,”XYZ”,” Adventure Works
ABCD”,”MNO”,” Adventure Works


 
The table Structure is as follows:
create table tblPeople
(FName varchar(5),LName varchar(5),Country varchar(50)) 


You can use the following 3 options to insert data into the SQL Server database table:
1.       Bulk Insert (T-SQL command)
2.       BCP (Command Line Utility)
3.       Export/Import Wizard (GUI)
The only way to remove the quotation marks would be to modify the column delimiters specified during the import operation. The only drawback here is that if you inspect the data to be inserted, you will very quickly realize that the column delimiters are different for each column (Delimiters highlighted above).
So to specify different column delimiters for each column, you would need to use a format file if you plan to use Bulk Insert or BCP. If you generate a format file for the above table structure, it would be as follows:
9.0
3
1       SQLCHAR       0       5       "\t"     1     FName              SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5       "\t"     2     LName              SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       50      "\r\n"   3     Company            SQL_Latin1_General_CP1_CI_AS
Modify the format file to represent the correct column delimiters for each column. The new format file to be used will look like this:
9.0
4
1       SQLCHAR       0       0     "\""      0     FIRST_QUOTE      SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5     "\",\""   1     FNAME               SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       5     "\",\""   2     LNAME            SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       50    "\"\r\n"  3     COMPANY          SQL_Latin1_General_CP1_CI_AS
The logic behind this is as follows:
1.       The first dummy column is added to get rid of the first double quotation mark in the beginning of each row
2.       Each character column data is separated by “,”. So, each column’s delimiter is given as “\”,\””. The “\” character is an escape sequence tells BCP that the character following the backslash is part of the delimiter.
3.       The last column delimiter is again a single double quotation mark. So, the column delimiter for the last column is “\”\r\n” to signify the single double quotation mark and end of the row.
4.       The number of columns has to be changed from 3 to 3+1 due to the introduction of a dummy column (highlighted in Red). 

Option 1: Bulk Insert Command
If you use the following Bulk Insert command to import the data without using a format file, then you will land up with a quotation mark prefix to the first column value and a quotation mark suffix for the last column values and a quotation mark prefix for the first column values. The command used would be:
BULK INSERT tblPeople
   FROM 'bcp.txt'
   WITH (
      DATAFILETYPE='char',
      FIELDTERMINATOR='","',
      ROWTERMINATOR = '\n');

The inserted data would look like this:

image
You would need to use the REPLACE function to get rid to the quotation suffix/prefix for the last and first columns respectively. 
 
In case, you decided to use the modified format file above, then you will be able to import data correctly into the table using the following command:

BULK INSERT tblPeople
   FROM 'bcp.txt'
   WITH (
      DATAFILETYPE='char',
      FIELDTERMINATOR='","',
      ROWTERMINATOR = '\n',
      FORMATFILE = 'bcp.fmt');


Option 2: BCP
If you use BCP utility to insert data into the table, then you would run into the same issue as inserting data using BULK INSERT without a format file. So, again you would need to use the format file to insert the data. The BCP command would be as follows for inserting data into the default instance if executed from a SSMS query window:
EXEC xp_cmdshell 'bcp <database name>.<schema name>.tblPeople in "bcp.txt" -T -f "bcp.fmt"'
In case you do not want to use a T-SQL query, then navigate to the <System Drive:>\Program Files\Microsoft SQL Server\90\Tools\binn folder and execute the command enclosed in single quotes to import the data into the table using BCP. 

Option 3: Export/Import Wizard
If you are planning to use the Export/Import Wizard, then you would have to do a few modifications after you select the flat file to be imported in the GUI.
The first thing that you would need to do after selecting the data file in the GUI, is to add two dummy columns before and after the number of columns already present.
image
In the example that we are using, we would add a dummy Column 0 and then specify the Column Delimiter as () to remove the first double quotation mark.
Then you would need to specify ("{,}") as the Column Delimiters for Columns 1 & 2. For Column 3, you would need to specify the Column Delimiter as (). After that a dummy column, Column 4 needs to be added to take care of the row delimiter. For this column, you would specify the Column Delimiter as ({CR}{LF}).
Once this is done, you would need to click on the Edit Mappings option under the Select Source Tables and Views screen. You would need to specify the Destination for Columns 0 and 4 as <ignore>, so that the SSIS package doesn’t import the data for Columns 0 & 4 into the table.
image
Note: An assumption is being made here that all the rows in the flat file have equal number of column delimiters.


Additional Information
In case the data had non-character values like Integer data, then format file would be as follows:
Sample data:
ABC”,1,”USA
PQR”,2,”UK
Then your format file would be as follows to account for the different delimiters:
9.0
4
1       SQLCHAR       0       0     "\""      0     FIRST_QUOTE      SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       5     "\","     1     FNAME            SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       5     ",\""     2     LNAME            SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       50    "\"\r\n"  3     COMPANY          SQL_Latin1_General_CP1_CI_AS


Such information existed for older BCP versions, so I thought I would augment the options provided along with the other utilities that are currently available:
132463  INF: Using BCP When Data is in Quotation Marks
73182   INFO: Using BCP When Data Is Within Quotation Marks

No comments:

Post a Comment