Reading and writing to Excel 2007 or Excel 2010 from C# - Part III: Shared Strings
[Note: See the series index for a list of all parts in this series.]
Excel’s file format is an interesting one compared to the rest of the Office Suite in that it can store data in two places where most others store the data in a single place. The reason Excel supports this is for good performance while keeping the size of the file small. To illustrate the scenario lets pretend we had a single sheet with some info in it:

Now for each cell we need to process the value and the total size would be 32 characters of data. However with a shared strings model we get something that looks like this:
The result is the same however we are processing values once and the size is less, in this example 24 characters.
The Excel format is pliable, in that it will let you do either way. Note the Excel client will always use the shared strings method, so for reading you should support it. This brings up an interesting scenario, say you are filling a spreadsheet using direct input and then you open it in Excel, what happens? Well Excel identifies the structure, remaps it automatically and then when the user wishes to close (regardless if they have made a change or not) will prompt them to save the file.
The element we loaded at the end of part 2 is that shared strings file, which in the archive is \xl\sharedstrings.xml. If we look at it, it looks something similar to this:
Each <t> node is a value and it corresponds to a value in the sheet which we will parse later. The sheet will have a value in it, which is the key to the item in the share string. The key is an zero based index. So in the above example the first <t> node (Some) will be stored as 0, the second (Data) will be 1 and so on. The code to parse it which I wrote looks like this:Some Data Belongs Here
private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary<int, string>sharedStrings) { IEnumerable<XElement> sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t") select s; int Counter = 0; foreach (XElement sharedString in sharedStringsElements) { sharedStrings.Add(Counter, sharedString.Value); Counter++; } }
Using this I am parsing the node and putting the results into a Dictionary<int,string>.
Reading and Writing to Excel 2007 or Excel 2010 from C# - Part II: Basics
[Note: See the series index for a list of all parts in this series.]

To get support for the technologies we will use in this we need to add a few assembly references to our solution:
- WindowsBase.dll
- System.Xml
- System.Xml.Linq
- System.Core
Next make sure you have the following namespaces added to your using/imports:
- System.IO.Packaging: This provides the functionality to open the files.
- System.Xml
- System.Xml.Linq
- System.Linq
- System.IO
Right next there is a XML namespace (not to be confused with .NET code name spaces) we need to use for most of our queries: http://schemas.openxmlformats.org/spreadsheetml/2006/main and there is a second one we will use seldom http://schemas.openxmlformats.org/officeDocument/2006/relationships. So I dumped this into a nice static class as follows:
namespace XlsxWriter { using System.Xml.Linq; internal static class ExcelNamespaces { internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main"); internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships"); } }
Next we need to create an instance of the System.IO.Packaging.Package class (from WindowsBase.dll) and instantiate it by calling the static method Open.
Package xlsxPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite);
Note: It is at this point that the file is opened, this is important since Excel will LOCK an open file. This is an important issue to be aware of because when you open a file that is locked a lovely exception is thrown. To correct that you must make sure to call the close method on the package, for example:
xlsxPackage.Close();
When you open the XLSX file manually, the first file you’ll see is the [Content_Types].xml file which is a manifest of all the files in the ZIP archive. What is nice with using Packaging is that you can call the GetParts method to get a collection of Parts, which are actually just the files within the XLSX file.
The contents of the XLSX if renamed to a ZIP file and opened.
The various files listed in the [Content_Types].xml file.
What we will use during this is the ContentType parameter to filter the parts to the specific item we want to work with. The second image above to identify the value for the ContentType. For example the ContentType for a worksheet is: application/vnd.openxmlformats-officedocument.speadsheetml.worksheet+xml.
Once we have all the parts of the XLSX file we can navigate through it to get the bits we need to read the content, which involves two steps:
- Finding the shared strings part. This is another XML file which allows for strings of values to shared between worksheets. This is optional for writing, to use but does save space and speed up loading. For reading values it is required as Excel will use it.
- Finding the worksheet that we want to read from, this is a separate part from the shared strings.
Lets start with reading the shared strings part, this will be basis for reading any part later in series. What we need to do is get the first PackagePart with the type: application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml
PackagePart sharedStringsPart = (from part in allParts where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml") select part).Single();
Now we need to get the XML content out of the PackagePart, which is easy with the GetStream method, which we load into an XmlReader so that it can be loaded into a XElement. This is a bit convoluted but it is just one line to get it from one type to another and the benefits of using LINQ to XML are worth it:
XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));
Now we have the ability to work with the XElement and do some real work. In the next parts, we’ll look at what we can do with it and how to get from a single part to an actual sheet.
Gallery2 + C# - Beta 2 Available
A few weeks back I posted beta 2 of the gallery 2 .net toolkit where I have done considerable more work on it than I ever expected I would. Lots of need bits of code and features available. What’s in it now:
There are four items currently available:
- (Tool) For people just wanting to export all their images out of Gallery2, there is g2Export which is a command line tool to export images.
- (Tool) For people wanting to get information out of Gallery2 into a sane format, there is g2 Album Management which is an Excel 2007 add-in to export information about albums and images to Excel.
- (API) For developers wanting to write their own tools or integrations, there is the SADev.Gallery2.Protocol which wraps the Gallery2 remote API. Please see the What you should know? page for information on using the API.
- (Source) Lastly for developers needing some help, there is the source code for the the g2 Export Tool and the g2 Album Management Excel Add-in
Here is a screen shot of g2Export in action:
If you are interested in how much of the Gallery2 API is catered for, it’s most of it (the file upload parts are the only major outstanding ones). The key thing to note on the table is the tested column. While the code is written, it may not be tested and may not work at all. I have found the documentation is not 100% in line with the actual gallery2 code so something it needs considerable rework for it to actually work.
API Call | Basic Request | Basic Response | Tested | Advanced Request | Advanced Response |
login | done | done | done | done | done |
fetch-albums | done | done | done | done | done |
fetch-albums-prune | done | done | done | done | done |
add-item (upload) | done | done | |||
add-item (url) | done | done | done | done | |
album-properties | done | done | done | done | done |
new-album | done | done | done | done | |
fetch-album-images | done | done | done | done | done |
move-album | done | done | done | done | |
increment-view-count | done | done | done | done | |
image-properties | done | done | done | done | done |
no-op | done | done | done | done | done |
Proven Source Control Practises Poster
Maybe one of the toughest things in software development to get right all the time: source control. Well now with this nice bright A3 poster printed on your wall (or maybe above the monitor of the guy who breaks the builds daily) you’ll never go wrong again.
It covers 17 proven practises broken into 5 key areas:
Things YOU should do
- Keep up to date
- Be light and quick with checkouts
- Don’t check in unneeded binaries
- Working folders should be disposable
- Use undo/revert sparingly
Branching
- Plan your branching
- Own the merge
- Look after branches
Management
- Useful & meaningful check in messages
- Don’t use the audit trial for blame
Repository
- Don’t break the build
- Separate your repo
- Don’t forget to shelve
- Use labels
Technology
- Try concurrent access
- Don’t be afraid of branching concepts
- Automerge for checkout only
- For more posters go to www.drp.co.za
ADO.NET Data Services Cheat Sheet (WCF Data Service)
Above is a screen shot of an A3 cheat sheet I created for ADO.NET Data Services (version 1). The poster covers the filters, methods and gives plenty of examples in a nice bright poster.
Gallery2 + C#
Gallery2 is a web based PHP gallery system with a remote API for doing many things. I have been using it for a while, but have decided to change and so I wanted to export my images, which is harder than it sounds. To actually get this done I ended up writing a basic wrapper for the Gallery2 remote API and implementing a small console application to do the export.
If you are interested in the wrapper or the tool itself, I have setup a CodePlex project for it where you can download those: http://gallery2.codeplex.com/
The reason it is there, is because I have decided to open source it because it is useful to people besides me and I have gotten what I need from it, so I doubt I’ll spend much time getting it feature complete. This way someone else can get the tool (if that is all they need) or get the source and add to it.
Screen shot of the tool running.
Reading and writing to Excel 2007 or Excel 2010 from C# - Part I: Primer
[Note: See the series index for a list of all parts in this series.]
Over the past week I have been learning about the complexity of working with Excel 2007 native file format - XLSX or as it is known correctly, SpreadsheetXL. There is three ways to work with it, firstly build your own parser - just too much work for me or second use OpenXML SDK format which Microsoft provides. The current version, at time of writing that was version 1, of the SDK is not great: there is very little (if any) benefit of using it over the third method. There is a V2 SDK currently in beta which looks brilliant and frankly when released would be the recommend route.
The third way, which is the way I chose is the uses new features introduced in the .NET Framework 3.0.
What is a XLSX file? A XLSX file is actually just a ZIP file which contains a number of XML files in it.
This means all you need to do is open the XLSX file as a ZIP file, get the right XML files (or parts as they are referred to) out of it and parse those.
If you are thinking this is a .NET only solutionthe chart below is from Doug Mahugh which shows a number of ways across a number of technologies/OS’s to do the same thing. This series will focus on the .NET way.
What is nice about using System.IO.Packaging to read the file over the direct ZIP options, is that there are some helper methods to make it easier when working with any of the new formats (docx, xlsx etc...)
My Presentation @ Dev4Devs
If you are attending Dev4Devs on Saturday (or are here after the event) and you are looking for a copy of the slides and code you can get them below! If are looking for the ADO.NET Data Services cheat sheet I mentioned then you need to go here.
Code
The code here is also different from what I presented in the following ways:
- There is a timer control in it - so if you add items to the DB while on the site, it updates and shows those changes within 5 seconds.
- The layout is slightly bigger (bigger header) and has buttons (to make it look like an email client) - these were removed because it doesn’t work at 1024x768 (aka the projector resolution). So they back in their graphical beauty.
- There is a feed button which links to a ATOM feed for the last 10 emails - something I mentioned you could do, well now you can see it.
- There is a database creation script, but no data. You need to create your own data.
Slide Show
Some Free Posters I've Created Recently
One of the things that I do at BB&D, is produce guidance posters. So far I have produced two of them and both are publically available on the BB&D developer guidance site DRP. The first is “Outlook + Exchange = Better Together” and the second is “ADO.NET Data Services Cheat Sheet”, not two things you think about together often.
Outlook + Exchange = Better Together
The title is a bit marketing-ly, but the poster is really a nice over view of the 8 key areas of Outlook namely
- Contacts
- Outlook Web Access
- SharePoint
- Calendars
- Tasks
- Outlook Features
- RSS
The poster looks a little busy, but when printed at A3 it’s not bad at all. It also includes three areas (OWA, Mailbox size, SharePoint) where you can write in your organization details so if you print them out and put them on the wall they have some organization context.
ADO.NET Data Services Cheat Sheet
The next one I developed when learning with ADO.NET Data Services and it’s a bright and fun cheat sheet for it. It includes information on the query operators (with samples along the border), a list of functions, a list of comparison operators (like less than), query order, keys, and $value. I’ve found it very useful to print out and put up. It is designed for A3, but I have it printed at A4 (in grey scale) and it works just as well.
Free SharePoint Developer Training
On May the 23rd InformationWorker and Inobits are getting together and offering FREE SharePoint developer training. It is a full day event where Inobits will be providing a number of lab based training sessions focused on developing with SharePoint. Also being a full day event there will be some lunch provided for everyone (special thanks to 3Fifteen for sponsoring it). It should be a really great event for everyone, because you pick the lab/s you want to do which means if you are a beginner or expert you do the ones that will benefit you.
Because of all the effort to prep for this, you much register for it at http://www.informationworker.co.za/Pages/SLABSRegistration.aspx
A sampling of labs available is:
- Web Parts
- Data Lists
- Event Handlers
- Workflow
- Silverlight
- Page Navigation
- Page Branding
- Web Services
- Content Types
- User Management