Reading and writing to Excel 2007 or Excel 2010 from C# - Part III: Shared Strings

[Note: See the series index for a list of all parts in this series.]

Clipboard08

Excel’s file format is an interesting one compared to the rest of the Office Suite in that it can store data in two places where most others store the data in a single place. The reason Excel supports this is for good performance while keeping the size of the file small. To illustrate the scenario lets pretend we had a single sheet with some info in it:

Clipboard02

Now for each cell we need to process the value and the total size would be 32 characters of data. However with a shared strings model we get something that looks like this:

Clipboard03

The result is the same however we are processing values once and the size is less, in this example 24 characters.

The Excel format is pliable, in that it will let you do either way. Note the Excel client will always use the shared strings method, so for reading you should support it. This brings up an interesting scenario, say you are filling a spreadsheet using direct input and then you open it in Excel, what happens? Well Excel identifies the structure, remaps it automatically and then when the user wishes to close (regardless if they have made a change or not) will prompt them to save the file.

The element we loaded at the end of part 2 is that shared strings file, which in the archive is \xl\sharedstrings.xml. If we look at it, it looks something similar to this:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

  
    Some
  
  
    Data
  
  
    Belongs
  
  
    Here
  
Each <t> node is a value and it corresponds to a value in the sheet which we will parse later. The sheet will have a value in it, which is the key to the item in the share string. The key is an zero based index. So in the above example the first <t> node (Some) will be stored as 0, the second (Data) will be 1 and so on. The code to parse it which I wrote looks like this:
private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary<int, string>sharedStrings)
{
    IEnumerable<XElement> sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t")
                                                  select s;

    int Counter = 0;
    foreach (XElement sharedString in sharedStringsElements)
    {
        sharedStrings.Add(Counter, sharedString.Value);
        Counter++;
    }
}

Using this I am parsing the node and putting the results into a Dictionary<int,string>.

Pingback

Visitor's picture

[...] part III we looked the interesting part of Excel - Shared Strings which is just a central store for unique [...]

LINQ to XML

Visitor's picture
Hi I've done my whole solution in LINQ to XML but I realised today that it doesn't cater for the instance when a user doesn't enter a value in the field. If I look at the xml of the Excel file, it leaves out the cell tag. Doesn't even give an empty tag. Does your solution cater for this? It sounds like it might based on your nodes being numbered.. S

Correct, it is another way to

rmaclean's picture
Correct, it is another way to keep the file small. The solution does cater for that scenario.

Pingback

Visitor's picture

[...] Part III - Shared Strings rmaclean's blog [...]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Free Visual Studio training
StackOverflowServerFaultSuperUser Loading my StackExchange stats...
I am a community lead in this group.
We discuss SharePoint, Office and all things IW.
Twitter
Loading my tweets...