Reading and writing to Excel 2007 or Excel 2010 from C# - Part V: Full source for reading

[Note: See the series index for a list of all parts in this series.]

Clipboard08

A few people have battled with getting all the bits of code scattered in the series together to actually work. This is not only due to the fact they are scattered, but part III for example was not showing the code correctly and there was a bug in part IV. I have gone back and fixed those issues and to help further here is the full code in one big view (click read more if needed to see it).

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Packaging;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

namespace ReadFromExcel
{
    class Program
    {
        static void Main(string[] args)
        {
            List<Cell> parsedCells = new List<Cell>();
            string fileName = @"C:\Users\bbdnet0758\Desktop\Demo.xlsx";
            Package xlsxPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite);
            try
            {
                PackagePartCollection allParts = xlsxPackage.GetParts();

                PackagePart sharedStringsPart = (from part in allParts
                                                 where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml")
                                                 select part).Single();
                
                XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));

                Dictionary<int, string> sharedStrings = new Dictionary<int, string>();
                ParseSharedStrings(sharedStringsElement, sharedStrings);                

                XElement worksheetElement = GetWorksheet(1, allParts);

                IEnumerable<XElement> cells = from c in worksheetElement.Descendants(ExcelNamespaces.excelNamespace + "c")
                                              select c;

                foreach (XElement cell in cells)
                {
                    string cellPosition = cell.Attribute("r").Value;
                    int index = IndexOfNumber(cellPosition);
                    string column = cellPosition.Substring(0, index);
                    int row = Convert.ToInt32(cellPosition.Substring(index, cellPosition.Length - index));
                    int valueIndex = Convert.ToInt32(cell.Descendants(ExcelNamespaces.excelNamespace + "v").Single().Value);

                    parsedCells.Add(new Cell(column, row, sharedStrings[valueIndex]));
                }
            }
            finally
            {
                xlsxPackage.Close();
            }

            //From here is additional code not covered in the posts, just to show it works
            foreach (Cell cell in parsedCells)
            {
                Console.WriteLine(cell);
            }
        }

        private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary<int, string> sharedStrings)
        {
            IEnumerable<XElement> sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t")
                                                          select s;

            int Counter = 0;
            foreach (XElement sharedString in sharedStringsElements)
            {
                sharedStrings.Add(Counter, sharedString.Value);
                Counter++;
            }
        }

        private static XElement GetWorksheet(int worksheetID, PackagePartCollection allParts)
        {
            PackagePart worksheetPart = (from part in allParts
                                         where part.Uri.OriginalString.Equals(String.Format("/xl/worksheets/sheet{0}.xml", worksheetID))
                                         select part).Single();

            return XElement.Load(XmlReader.Create(worksheetPart.GetStream()));
        }

        private static int IndexOfNumber(string value)
        {
            for (int counter = 0; counter < value.Length; counter++)
            {
                if (char.IsNumber(value[counter]))
                {
                    return counter;
                }
            }

            return 0;
        }                         
    }

    internal static class ExcelNamespaces
    {
        internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main");
        internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships");
    }

    public class Cell
    {
        public Cell(string column, int row, string data)
        {
            this.Column = column;
            this.Row = row;
            this.Data = data;
        }

        public override string ToString()
        {
            return string.Format("{0}:{1} - {2}", Row, Column, Data);
        }

        public string Column { get; set; }
        public int Row { get; set; }
        public string Data { get; set; }
    }
}

I really think this is a really good solution, fast and efficient. I did have issues trying to get this to work when alpha and numeric characters are mixed in the same spreadsheet. From what I can tell excel 2010 stores the alpha characters in sharedstrings.xml and numeric remain in worksheet. worksheet has pointers to the index it appears sharedstrings.xml. When it is a referenced, the value is the index and column attribute t="s". I fixed and added the ability to start on another row other than first row (maybe dont want to capture the spreadsheet headers).

use it simply by using an Ajax fileUploader control
---------------------------------------------

Stream theStream = fupLoader.PostedFile.InputStream;
List lstResults = proExcel.GetListFromCells(theStream,2); //first row to capture, indexed at 1

-------------------------------------

modified class
----------------------------------

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Packaging;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
namespace Project.Utils
{
public static class proExcel {
public static List GetListFromCells(Stream excelReader, int startRow)
{
List parsedCells = new List();
//string fileName = @"C:\Users\bbdnet0758\Desktop\Demo.xlsx";
Package xlsxPackage = Package.Open(excelReader, FileMode.Open, FileAccess.Read);

try
{
PackagePartCollection allParts = xlsxPackage.GetParts();
PackagePart sharedStringsPart = (from part in allParts
where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml")
select part).Single();
XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));
Dictionary sharedStrings = new Dictionary();
ParseSharedStrings(sharedStringsElement, sharedStrings);
XElement worksheetElement = GetWorksheet(1, allParts);

IEnumerable cells = from c in worksheetElement.Descendants(ExcelNamespaces.excelNamespace + "c")
where Convert.ToInt32(c.Parent.Attribute("r").Value)>=startRow
select c;

foreach (XElement cell in cells)
{
string cellPosition = cell.Attribute("r").Value;
int index = IndexOfNumber(cellPosition);
string column = cellPosition.Substring(0, index);
int row = Convert.ToInt32(cellPosition.Substring(index, cellPosition.Length - index));

//if t exists and t=s then value is index in shared strings, else v is value
string value = "";
value = cell.Descendants(ExcelNamespaces.excelNamespace + "v").Single().Value.ToString();
if (cell.Attribute("t") != null)
{
if (cell.Attribute("t").Value == "s")
{
value = sharedStrings[Convert.ToInt32(value)];
}
}
parsedCells.Add(new Cell(column, row, value));
}

}
finally{
xlsxPackage.Close();
}
return parsedCells;

}
private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary sharedStrings)
{
IEnumerable sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t")
select s;
int Counter = 0;
foreach (XElement sharedString in sharedStringsElements)
{
sharedStrings.Add(Counter, sharedString.Value);
Counter++;
}
}
private static XElement GetWorksheet(int worksheetID, PackagePartCollection allParts) {
PackagePart worksheetPart = (from part in allParts
where part.Uri.OriginalString.Equals(String.Format("/xl/worksheets/sheet{0}.xml", worksheetID))
select part).Single();
return XElement.Load(XmlReader.Create(worksheetPart.GetStream()));
}
private static int IndexOfNumber(string value) {
for (int counter = 0; counter < value.Length; counter++) {
if (char.IsNumber(value[counter])) {
return counter;
}
}
return 0;
}
}
internal static class ExcelNamespaces {
internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main");
internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}
public class Cell
{
public Cell(string column, int row, string data)
{
this.Column = column;
this.Row = row;
this.Data = data;
}
public override string ToString()
{
return string.Format("{0}:{1} - {2}", Row, Column, Data);
}
public string Column { get; set; }
public int Row { get; set; }
public string Data { get; set; }
}

}

Hi Mike, have you covered the "write" portion of this? If so, where can I find it?

BTW, excellent series. I am currently using OpenXml but when I save the document and try to open it using OpenXml later it says it is the wrong format :-( After opening it and saving with Excel, it works fine...also your way of reading (above) cannot read it ... errors here on Linq:

PackagePart sharedStringsPart = (from part in allParts
where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml")select part).Single();

Thx,
DDan

To fill in the empty cells: Yoosuf, see my added coded for

if (cell.HasElements) ...

in the update I submitted. I think it solves your issue.

It's really helpfull. worked like a jetplane. I have implemented this code of urs and found that if any cells are empty in between, it will ignore that cell itself instead of returning any null value or something like that. Kindly clarify this and advise how to handle empty cells in between.

Simply brilliant. Thanks very much!

Thanks for this, much nicer than using JET drivers etc. I found in my case the xlsx contained a combination of shared strings and data values embedded directly in the cells. I guessed that the cell's atribute "t" means "type", and "s" means shared. Also found that in some cases the cell had no elements at all, and in my case I wanted an empty cell. Thus I changed the inner loop to:

foreach (XElement cell in cells)
{
string cellPosition = cell.Attribute("r").Value;
int index = IndexOfNumber(cellPosition);
string column = cellPosition.Substring(0, index);
int row = Convert.ToInt32(cellPosition.Substring(index, cellPosition.Length - index));
if (cell.HasElements)
{

if (cell.Attribute("t") != null && cell.Attribute("t").Value == "s")
{
// Shared value
int valueIndex = Convert.ToInt32(cell.Descendants(ExcelNamespaces.excelNamespace + "v").Single().Value);
parsedCells.Add(new Cell(column, row, sharedStrings[valueIndex]));
}
else
{
string value = cell.Descendants(ExcelNamespaces.excelNamespace + "v").Single().Value;
parsedCells.Add(new Cell(column, row, value));
}
}
else
{
parsedCells.Add(new Cell(column, row, ""));
}

}

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account associated with the e-mail address you provide, it will be used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Syntax highlight code surrounded by the <pre class="brush: lang">...</pre> tags, where lang is one of the following language brushes: as3, applescript, bash, csharp, coldfusion, cpp, css, delphi, diff, erlang, groovy, jscript, java, javafx, perl, php, plain, powershell, python, ruby, sass, scala, sql, vb, xml.

More information about formatting options