Reading and Writing to Excel 2007 or Excel 2010 from C# - Part II: Basics

[Note: See the series index for a list of all parts in this series.]

image

To get support for the technologies we will use in this we need to add a few assembly references to our solution:

  • WindowsBase.dll
  • System.Xml
  • System.Xml.Linq
  • System.Core

Next make sure you have the following namespaces added to your using/imports:

  • System.IO.Packaging: This provides the functionality to open the files.
  • System.Xml
  • System.Xml.Linq
  • System.Linq
  • System.IO

Right next there is a XML namespace (not to be confused with .NET code name spaces) we need to use for most of our queries: http://schemas.openxmlformats.org/spreadsheetml/2006/main and there is a second one we will use seldom http://schemas.openxmlformats.org/officeDocument/2006/relationships. So I dumped this into a nice static class as follows:

namespace XlsxWriter
{
    using System.Xml.Linq;

    internal static class ExcelNamespaces
    {
        internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main");
        internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships");
    }
}

Next we need to create an instance of the System.IO.Packaging.Package class (from WindowsBase.dll) and instantiate it by calling the static method Open.

 Package xlsxPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite);

Note: It is at this point that the file is opened, this is important since Excel will LOCK an open file. This is an important issue to be aware of because when you open a file that is locked a lovely exception is thrown. To correct that you must make sure to call the close method on the package, for example:

xlsxPackage.Close();

When you open the XLSX file manually, the first file you’ll see is the [Content_Types].xml file which is a manifest of all the files in the ZIP archive. What is nice with using Packaging is that you can call the GetParts method to get a collection of Parts, which are actually just the files within the XLSX file.

image
The contents of the XLSX if renamed to a ZIP file and opened.
image
The various files listed in the [Content_Types].xml file.




What we will use during this is the ContentType parameter to filter the parts to the specific item we want to work with. The second image above to identify the value for the ContentType. For example the ContentType for a worksheet is: application/vnd.openxmlformats-officedocument.speadsheetml.worksheet+xml.

Once we have all the parts of the XLSX file we can navigate through it to get the bits we need to read the content, which involves two steps:

  1. Finding the shared strings part. This is another XML file which allows for strings of values to shared between worksheets. This is optional for writing, to use but does save space and speed up loading. For reading values it is required as Excel will use it.
  2. Finding the worksheet that we want to read from, this is a separate part from the shared strings.

Lets start with reading the shared strings part, this will be basis for reading any part later in series. What we need to do is get the first PackagePart with the type: application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml

PackagePart sharedStringsPart = (from part in allParts
    where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml")
    select part).Single();

Now we need to get the XML content out of the PackagePart, which is easy with the GetStream method, which we load into an XmlReader so that it can be loaded into a XElement. This is a bit convoluted but it is just one line to get it from one type to another and the benefits of using LINQ to XML are worth it:

XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));

Now we have the ability to work with the XElement and do some real work. In the next parts, we’ll look at what we can do with it and how to get from a single part to an actual sheet.

Hi,

I am trying to unlink a SharePoint table from an Excel spreadsheet. I am new at this and somehow I don't read the DOM correctly. Any recommendations would be appreciated.

Sara

Hi,

I have been reading through these articles and working on the code and I am close to having it working, but there are still parts that I am not doing properly. It would help to see the final code for parts 1 through 4 (part 4 doesn't contain it) so I can determine where I went wrong.

Thanks,
Hamilton

Done - see http://www.sadev.co.za/content/reading-and-writing-excel-2007-or-excel-2010-c-part-v-full-source-reading

You can't solve it using Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite).
Actually the flag FileAccess.ReadWrite is not supported and if used - you would get System.NotSupportedException with "Only FileShare.Read and FileShare.None are supported."

Alex Pinsker,
http://alexpinsker.blogspot.com

My code is running fine with those attributes, so I disagree that you will get an exception using them. I agree that they are not a solution to having the file locked, and that is not what I was trying to imply. The solution to that is to call close and I have updated the post to reflect that.

Hi,
I just want to know how do I go about reading an Excel file (created via SpreadsheetML) into a DataSet?
I have tried to use OleDbConnection with my connection string being something like:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source="C:\excelFiles\sampleFileCreatedwithSpreadsheetML.xls";Extended Properties="Excel 8.0;IMEX=1;HDR=Yes"
But then I get an OleDbException with "External table is not in the expected format."

All I am given is a filename and told that its an Excel file (it was created using XSLT transformation).

Any idea on what I can try?

The same method as I did will work, except where I populate my custom class you should create a dataset and populate that instead. It is possible to use OleDbConnection, but that is not something I have tried so not sure why you are having a problem.

[...] public class Cell { public Cell(string column, int row, string data) { this.Column = column; this.Row = row; this.Data = data; } public string Column { get; set; } public int Row { get; set; } public string Data { get; set; } }How do we find the right worksheet? In the same way as we did get the shared strings in part II. [...]

[...] Part II - Basics [...]

[...] element we loaded at the end of part 2 is that shared strings file, which in the archive is \xl\sharedstrings.xml. If we look at it, it [...]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Syntax highlight code surrounded by the <pre class="brush: lang">...</pre> tags, where lang is one of the following language brushes: as3, applescript, bash, csharp, coldfusion, cpp, css, delphi, diff, erlang, groovy, jscript, java, javafx, perl, php, plain, powershell, python, ruby, sass, scala, sql, vb, xml.

More information about formatting options