In this post, I hope to show how to load in a CSV (comma-separated values) flat-file with an optional double quote (
") text qualifier and a XML format file using
BULK INSERT and
The sample data we’ll work with looks like:
First,Last,Age,Address "Tim","Brooks",50,"105 Bond Street, New York" "Bob","Jones",45,"81 Main Road, Florida"
You can see that only the
Age column is not qualified with double quotes. If you were to only use
, as a terminator, the data in the
Address column would overflow. The XML format file allows us to use single quotes for the
TERMINATOR attribute. This means we can specify the double quotes as the terminator. SQL Server also allows for non-XML format files, which are plain text-files. A XML format file is more beneficial because it’s backward compatible, flexible, and extendable.
The XML format file structure contains two main components;
RECORDdescribes the data as it is stored in the data file.
RECORDelement contains a set of one or more
FIELDelements. These elements correspond to fields in the data file.
ROWdescribes how to construct data rows from a data file when the data from the file is imported into a SQL Server table.
ROWelement contains a set of
COLUMNelements. These elements correspond to table columns.
FIELD attributes, the terminator specifies how each column is terminated. Let’s look at the first row of data:
"Tim","Brooks",50,"105 Bond Street, New York"
Let’s highlight the terminators:
"Tim","Brooks",50,"105 Bond Street, New York" ^^^ ^^ ^^ ^^
Each row is typically terminated using a non-visible new-line feed (
This will give us the following format file:
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' /> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' /> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='",' /> <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='"\n' /> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="First" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="2" NAME="Last" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="3" NAME="Age" xsi:type="SQLINT" /> <COLUMN SOURCE="4" NAME="Address" xsi:type="SQLVARYCHAR" /> </ROW> </BCPFORMAT>
ROW, I used the
SQLVARYCHAR data type for the strings, and
SQLINT type for the integer. You can find the full list of
xsi:types for the
COLUMN attribute here. You can also use the
NAME attribute to identify the column in the result set.
This query will retrieve the data using
SELECT a.* FROM OPENROWSET( BULK 'c:\test\values.txt', FORMATFILE = 'c:\test\values.xml', FIRSTROW = 2) AS a;
First Last Age Address ----- ------- ---- -------------------------- "Tim Brooks 50 105 Bond Street, New York "Bob Jones 45 81 Main Road, Florida
You can see that the
First column has a leading double quote. This is because we did not handle the double quote in the format file. It’s possible to add another
FILED attribute for a fixed length to handle this, but it won’t be covered in this post. Once this data is loaded into a staging table, you can use a
REPLACE function to replace the double quote:
UPDATE #stage SET First = REPLACE(First, '"', '')
Using BULK INSERT
This query will insert the data using
BULK INSERT #import FROM 'c:\test\values.txt' WITH ( FIRSTROW = 2, FORMATFILE = 'c:\test\values.xml' )
After selecting the imported data from
#import, we get the same result set as above. Again, the
First column has a leading double quote and can be handled in the format file or using the