How to remove unwanted Quotation Marks while importing a data file
This post describes how to strip off unwanted quotation marks around the character strings in the data file.
Let’s
assume that you have a text file which has the following data and you
want to import the data into a SQL Server database table.
“ABC”,”DEF”,”Adventure Works”
“PQR”,”XYZ”,” Adventure Works”
“ABCD”,”MNO”,” Adventure Works”
The table Structure is as follows:
create table tblPeople
(FName varchar(5),LName varchar(5),Country varchar(50))
You can use the following 3 options to insert data into the SQL Server database table:
1. Bulk Insert (T-SQL command)
2. BCP (Command Line Utility)
3. Export/Import Wizard (GUI)
The only way to remove the quotation marks would be to modify the column delimiters
specified during the import operation. The only drawback here is that
if you inspect the data to be inserted, you will very quickly realize
that the column delimiters are different for each column (Delimiters
highlighted above).
So
to specify different column delimiters for each column, you would need
to use a format file if you plan to use Bulk Insert or BCP. If you generate a format file for the above table structure, it would be as follows:
9.0
3
1 SQLCHAR 0 5 "\t" 1 FName SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 5 "\t" 2 LName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 50 "\r\n" 3 Company SQL_Latin1_General_CP1_CI_AS
Modify
the format file to represent the correct column delimiters for each
column. The new format file to be used will look like this:
9.0
4
1 SQLCHAR 0 0 "\"" 0 FIRST_QUOTE SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 5 "\",\"" 1 FNAME SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 5 "\",\"" 2 LNAME SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 50 "\"\r\n" 3 COMPANY SQL_Latin1_General_CP1_CI_AS
The logic behind this is as follows:
1. The first dummy column is added to get rid of the first double quotation mark in the beginning of each row
2. Each
character column data is separated by “,”. So, each column’s delimiter
is given as “\”,\””. The “\” character is an escape sequence tells BCP
that the character following the backslash is part of the delimiter.
3. The
last column delimiter is again a single double quotation mark. So, the
column delimiter for the last column is “\”\r\n” to signify the single
double quotation mark and end of the row.
4. The number of columns has to be changed from 3 to 3+1 due to the introduction of a dummy column (highlighted in Red).
Option 1: Bulk Insert Command
If you use the following Bulk Insert
command to import the data without using a format file, then you will
land up with a quotation mark prefix to the first column value and a
quotation mark suffix for the last column values and a quotation mark
prefix for the first column values. The command used would be:
BULK INSERT tblPeople
FROM 'bcp.txt'
WITH (
DATAFILETYPE='char',
FIELDTERMINATOR='","',
ROWTERMINATOR = '\n');
The inserted data would look like this:
You would need to use the REPLACE function to get rid to the quotation suffix/prefix for the last and first columns respectively.
In
case, you decided to use the modified format file above, then you will
be able to import data correctly into the table using the following
command:
BULK INSERT tblPeople
FROM 'bcp.txt'
WITH (
DATAFILETYPE='char',
FIELDTERMINATOR='","',
ROWTERMINATOR = '\n',
FORMATFILE = 'bcp.fmt');
Option 2: BCP
If you use BCP utility
to insert data into the table, then you would run into the same issue
as inserting data using BULK INSERT without a format file. So, again you
would need to use the format file to insert the data. The BCP command
would be as follows for inserting data into the default instance if
executed from a SSMS query window:
EXEC xp_cmdshell 'bcp <database name>.<schema name>.tblPeople in "bcp.txt" -T -f "bcp.fmt"'
In case you do not want to use a T-SQL query, then navigate to the <System Drive:>\Program Files\Microsoft SQL Server\90\Tools\binn folder and execute the command enclosed in single quotes to import the data into the table using BCP.
Option 3: Export/Import Wizard
If you are planning to use the Export/Import Wizard, then you would have to do a few modifications after you select the flat file to be imported in the GUI.
The
first thing that you would need to do after selecting the data file in
the GUI, is to add two dummy columns before and after the number of
columns already present.
In the example that we are using, we would add a dummy Column 0 and then specify the Column Delimiter as (“) to remove the first double quotation mark.
Then you would need to specify ("{,}") as the Column Delimiters for Columns 1 & 2. For Column 3, you would need to specify the Column Delimiter as (“).
After that a dummy column, Column 4 needs to be added to take care of
the row delimiter. For this column, you would specify the Column
Delimiter as ({CR}{LF}).
Once this is done, you would need to click on the Edit Mappings option under the Select Source Tables and Views screen. You would need to specify the Destination for Columns 0 and 4 as <ignore>, so that the SSIS package doesn’t import the data for Columns 0 & 4 into the table.
Note: An assumption is being made here that all the rows in the flat file have equal number of column delimiters.
Additional Information
In case the data had non-character values like Integer data, then format file would be as follows:
Sample data:
“ABC”,1,”USA”
“PQR”,2,”UK”
Then your format file would be as follows to account for the different delimiters:
9.0
4
1 SQLCHAR 0 0 "\"" 0 FIRST_QUOTE SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 5 "\"," 1 FNAME SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 5 ",\"" 2 LNAME SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 50 "\"\r\n" 3 COMPANY SQL_Latin1_General_CP1_CI_AS
Such
information existed for older BCP versions, so I thought I would
augment the options provided along with the other utilities that are
currently available:
132463 INF: Using BCP When Data is in Quotation Marks
73182 INFO: Using BCP When Data Is Within Quotation Marks