Using OPENROWSET to read large files into SQL Server

Problem

OPENROWSET is a T-SQL function that allows for reading data from many sources including using the SQL Server’s BULK import capability. One of the useful features of the BULK provider is its ability to read individual files from the file system into SQL Server, such as loading a data from a text file or a Word document into a SQL Server table. This capability is the subject of this tip.

Solution

The BULK option was added to T-SQL in SQL Server 2005.

When used with the BULK provider keyword you can name a data file to read as one of three types of objects:

  • SINGLE_BLOB, which reads a file as varbinary(max)
  • SINGLE_CLOB, which reads a file as varchar(max)
  • SINGLE_NCLOB, which reads a file as nvarchar(max)

OPENROWSET returns a single column, named BulkColumn, as its result. Here’s an example that reads a text file:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\mytxtfile.txt', SINGLE_CLOB) MyFile 

The correlation name, in this case MyFile, is required by OPENROWSET.

There are additional requirements when reading single files that must also be observed as mentioned below.

Access control is always a concern. The operating system level file operations to read the file are executed with the privileges of the account that the SQL Server data engine is using. Therefore, only files accessible to that account may be read. This includes network drives or UNC paths, which are permitted if the account has the privileges. If you want to read network files, run SQL Server as a domain user.

The BULK provider won’t convert between Unicode and plain ASCII files. It must be told which type of encoding is used in the file. If you don’t the result is error 4806 as seen here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_CLOB) MyFile 

Msg 4806, Level 16, State 1, Line 1
SINGLE_CLOB requires a double-byte character set (DBCS) (char) input file. The file specified is Unicode.

Unicode files must be read with the SINGLE_NCLOB option shown here:

SELECT BulkColumn 
FROM OPENROWSET (BULK 'c:\temp\SampleUnicode.txt', SINGLE_NCLOB) MyFile 

Similarly, files with non text structures, such as Word documents are not converted. They must be converted by some other mechanism before being read or they can be read as binary files with the SINGLE_BLOB option.

OPENROWSET isn’t flexible about how you provide the name of the file. It must be a string constant. That requirement forces the use of dynamic SQL when the file name isn’t known in advance.

Here’s a stored procedure that reads any text file and returns the contents as an output variable:

SET ANSI_NULLS ON 
GO 
SET QUOTED_IDENTIFIER ON 
GO 
CREATE PROC [dbo].[ns_txt_file_read]  
    @os_file_name NVARCHAR(256) 
   ,@text_file VARCHAR(MAX) OUTPUT  
/* Reads a text file into @text_file 
* 
* Transactions: may be in a transaction but is not affected 
* by the transaction. 
* 
* Error Handling: Errors are not trapped and are thrown to 
* the caller. 
* 
* Example: 
    declare @t varchar(max) 
    exec ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output 
    select @t as [SampleTextDoc.txt] 
* 
* History: 
* WHEN       WHO        WHAT 
* ---------- ---------- --------------------------------------- 
* 2007-02-06 anovick    Initial coding 
**************************************************************/  
AS  
DECLARE @sql NVARCHAR(MAX) 
      , @parmsdeclare NVARCHAR(4000)  
SET NOCOUNT ON  
SET @sql = 'select @text_file=(select * from openrowset ( 
           bulk ''' + @os_file_name + ''' 
           ,SINGLE_CLOB) x 
           )' 
SET @parmsdeclare = '@text_file varchar(max) OUTPUT'  
EXEC sp_executesql @stmt = @sql 
                 , @params = @parmsdeclare 
                 , @text_file = @text_file OUTPUT 

To see how it works, just execute the example script: First create a text file called “SampleTextDoc.txt” and add some text data to the file. For our example we added the following text “The quick brown fox jumped over the lazy dog.”.

DECLARE @t VARCHAR(MAX) 
EXEC ns_txt_file_read 'c:\temp\SampleTextDoc.txt', @t output 
SELECT @t AS [SampleTextDoc.txt]  

The results are:

SampleTextDoc.txt
The quick brown fox jumped over the lazy dog. 
(1 row(s) affected)

The performance of reading text files is remarkably fast because the files are read sequentially. Using a 64 bit SQL Server on a development machine, reading a file of 750,000,000 bytes took only 7 seconds.

Next Steps

  • If there is a need to bulk insert large text files or binary objects into SQL Server look at using OPENROWSET

Leave a Reply

Your email address will not be published. Required fields are marked *