In this post, I hope to show how to load in a CSV (comma-separated values) flat-file with an optional double quote (") text qualifier and a XML format file using BULK INSERT and OPENROWSET.
The sample data we’ll work with looks like:
First,Last,Age,Address "Tim","Brooks",50,"105 Bond Street, New York" "Bob","Jones",45,"81 Main Road, Florida"
You can see that only the Age column is not qualified with double quotes. If you were to only use , as a terminator, the data in the Address column would overflow. The XML format file allows us to use single quotes for the TERMINATOR attribute. This means we can specify the double quotes as the terminator. SQL Server also allows for non-XML format files, which are plain text-files. A XML format file is more beneficial because it’s backward compatible, flexible, and extendable.
The XML format file structure contains two main components; RECORD and ROW.
RECORDdescribes the data as it is stored in the data file.
EachRECORDelement contains a set of one or moreFIELDelements. These elements correspond to fields in the data file.ROWdescribes how to construct data rows from a data file when the data from the file is imported into a SQL Server table.
AROWelement contains a set ofCOLUMNelements. These elements correspond to table columns.
For the FIELD attributes, the terminator specifies how each column is terminated. Let’s look at the first row of data:
"Tim","Brooks",50,"105 Bond Street, New York"
Let’s highlight the terminators:
"Tim","Brooks",50,"105 Bond Street, New York"
^^^ ^^ ^^ ^^
Each row is typically terminated using a non-visible new-line feed (\n).
| Column | Terminator |
|---|---|
| First | “,” |
| Last | “, |
| Age | ,” |
| Address | “\n |
This will give us the following format file:
<?xml version="1.0"?> <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' /> <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' /> <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='",' /> <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='"\n' /> </RECORD> <ROW> <COLUMN SOURCE="1" NAME="First" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="2" NAME="Last" xsi:type="SQLVARYCHAR" /> <COLUMN SOURCE="3" NAME="Age" xsi:type="SQLINT" /> <COLUMN SOURCE="4" NAME="Address" xsi:type="SQLVARYCHAR" /> </ROW> </BCPFORMAT> |
In the ROW, I used the SQLVARYCHAR data type for the strings, and SQLINT type for the integer. You can find the full list of xsi:types for the COLUMN attribute here. You can also use the NAME attribute to identify the column in the result set.
Using OPENROWSET
This query will retrieve the data using OPENROWSET:
SELECT a.* FROM OPENROWSET( BULK 'c:\test\values.txt', FORMATFILE = 'c:\test\values.xml', FIRSTROW = 2) AS a; |
Result
First Last Age Address ----- ------- ---- -------------------------- "Tim Brooks 50 105 Bond Street, New York "Bob Jones 45 81 Main Road, Florida
You can see that the First column has a leading double quote. This is because we did not handle the double quote in the format file. It’s possible to add another FILED attribute for a fixed length to handle this, but it won’t be covered in this post. Once this data is loaded into a staging table, you can use a REPLACE function to replace the double quote:
UPDATE #stage SET First = REPLACE(First, '"', '') |
Using BULK INSERT
This query will insert the data using BULK INSERT:
BULK INSERT #import FROM 'c:\test\values.txt' WITH ( FIRSTROW = 2, FORMATFILE = 'c:\test\values.xml' ) |
After selecting the imported data from #import, we get the same result set as above. Again, the First column has a leading double quote and can be handled in the format file or using the REPLACE function.
References
What if the unquoted age column was the FIRST column in the data. How woul dthe xml file look then?
“SQLVARYCHAR”?
Correct. See the following doc: https://docs.microsoft.com/en-us/sql/relational-databases/import-export/xml-format-files-sql-server