Monday 1 August 2011

Read data from Microsoft Excel using Microsoft Visual C# .NET

There is many ways to read excel file. This article describes several methods for transferring data from Microsoft Excel using a Microsoft Visual C# .NET program. It also presents the advantages and disadvantages of each method.

State of the Art

Read data via COM Interop.

To use the Excel object you must first set a COM reference to the Microsoft Excel Object Library which is an ActiveX component. Behind the scenes .NET uses the Interop.Excel DLL to talk to Excel. This DLL must be distributed with your application.[1]
You can format your data as a TAB delimited string, use the Excel object to start an instance of Microsoft Excel, select the worksheet and range to populate, and paste in your data.
An issue exists with the Excel object when the user closes your application before closing the instance of Excel that you started. This instance becomes an orphaned process - i.e. when you look at the Processes tab in Windows Task Manager you will see a process exists for Excel.Exe yet there will be no visible Excel applications or windows present.[2]
This method of reading Excel is very slow, especially in case when input files have more than 100,000 rows and required Excel to be installed on the target machine.[3]

'Excel's OleDb Driver

Another way of reading an Excel file and loading it to a DataTable is by using OleDbConnection, OleDbCommand and OleDbDataReaders. Basically you will treat the Excel file like a database where each work sheet represents a table.[4]
The interop assemblies provide a lot of options for working with Excel data, both reading and writing. Some experimenting with the object model can yield decent performance and expose some very useful options which are not obvious from reading the documentation. OleDB provides fast read access to Excel data, but it didn't meet my specific needs, which included accessing only certain columns and data validation.[5]
When DataReader ran through data rows without reading them (i.e. while (rd.Read());) it was quite fast, but retrieving data was still very slow.[3]

Custom Reader

  • Alex Reader. Reader will produce a DataSet object, with sheets represented as DataTables within it.[3]
  • Dusty Candland Reader. So in looking for a better way to accomplish my goals with reasonable CPU and speed, I experimented with the Office 2003 Interop Assemblies. I found, in my opinion, a decent way to accomplish the needed speed. CPU usage can still be high, but at an acceptable trade off for my needs.The interop assemblies provide a lot of options for working with Excel data, both reading and writing. Some experimenting with the object model can yield decent performance and expose some very useful options which are not obvious from reading the documentation.[5]

Best Practices

  • Turn Off Everything But the Essentials While Your Code is Running[6]
  • Read/Write Large Blocks of Cells in a Single Operation
  • Avoid Selecting / Activating Objects
  • Use Range.SpecialCells() to scope down the number of cells your code needs to work with.
  • For simple functions used by your code in high frequency, implement them yourself in VBA instead of using the WorksheetFunction object.
  • Declare variables with explicit types to avoid the overhead of determining the data type (repetitively if used in a loop) during code execution.

References


No comments: