| Robert MacLean

Two days in and what's been happening?

Robert MacLean

23 August 2009

Two days into the new S.A. Architect site and what's been happening?

Added Willie Roberts to the community leads page - please blame me for forgetting to have him there in the first place.
Got our first hosted blogger - Bramzer! Very keen to see what content he provides.
Profile images are disabled - we having a technical issue with them which I hope to fix in the next week.
Seeing more traffic than usual for a weekend on the site, which is in part due to the mails and so on, but more than that people are making use of things like the contact forms and so on! This is really great to see.

If you have suggestions or ideas on how to improve the site going forward, I'm looking at those two people who voted the site is average :) please let me know!

VSTS Rangers - Using PowerShell for Automation - Part I: Structure & Build

Robert MacLean

20 August 2009

powershell_2

As Willy-Peter pointed out, a lot of my evenings have been filled with a type of visual cloud computing, that being PowerShell’s white text on a blue (not quiet azure) background that make me think of clouds for the purpose of automating the virtual machines that the VSTS Rangers are building.

So how are we doing this? Well there is a great team involved, it’s not just me and there are a few key scripts we are trying build which should come as no surprise it’s configuration before and after you install VSTS/VS/Required software, tailoring of the environment and installing software.

The Structure

So how have structured this? Well since each script is developed by a team of people I didn’t want to have one super script that everyone is working on and fighting conflicts and merges all day and yet at the same time I don’t want to ship 100 scripts that do small functions and call each other - I want to ship one script but have development done on many. So how are we doing that? Step one was to break down the tasks of a script into functions, assign them to people and assign a number to them - like this:

*Note this is close to reality, but the names have been changed to protect the innocent team members and tasks at this point.

Task	Assigned To	Reference Number
Install SQL	Team Member A	2015
Install WSS	Team Member B	2020
Install VSTS	Team Member C	2025

And out of that people produce the scripts in the smallest way and they name them beginning with the reference number - so I get a bunch of files like:

2015-InstallSQL.ps1
2020-InstallWSS.ps1
2025-InstallVSTS.ps1

The Build Script

These all go into a folder and then I wrote another PowerShell script which combines them into a super script which is what is used. The build script is very simply made up of a number PowerShell commands that get the content and outputs it to a file, which looks like:

Get-Content .\SoftwareInstall\*.ps1 | Out-File softwareInstall.ps1

That handles the combining of the scripts and the reference number keeps the order of the scripts correct.

As an aside I didn’t use PowerShell for the build script originally, I used the old school style DOS copy command - however it had a few bugs.

The Numbering

What’s up with that numbering you may ask? Well for those younger generation who never coded with line numbers and GOTO statements it may seem weird to leave gaps in the numbering and should rather have sequential numbering - but what happens when someone realises we have missed something? Can you image going to each file after that and having to change numbers -EEK! So leaving gaps is leaving the ability to deal with mistakes in a non-costly way.

Next why I am starting with 1015? Well each script is given a 1000 numbers (so pre install would be 1000, software install 2000 etc…) so that I can look at script and know what it’s for and if it’s in the wrong place. I start at 15 as 00, 05 and 10 are already taken:

00 - Header. The header for the file explaining what it is.
05 - Functions. Common functions for all scripts.
10 - Introduction. This is a bit of text that will be written to the screen explaining the purpose of the script and ending with a pause. The pause is important because if you ran the wrong script you can hit Ctrl+C at that point and nothing will have been run.

So that is part I. In future parts I will be looking at some of the scripts and learning's I have been getting.

File attachments

powershell_2.jpg (35.25 KB)

What is a South African ID number made up of?

Robert MacLean

19 August 2009

Update 11 August 2011: Want this as an app for your smartphone? Click here

Update 30 March 2012: Details of the racial identifier can be found at: http://www.sadev.co.za/content/south-african-id-numbers-racial-identifier-flag

Cecil Tshikedi asked a great question - what is actually in an ID number:

The first six numbers are the birth date of the person in YYMMDD format - so no surprise that my ID number starts 820716.
The next four are a gender, 5000 and above is male and below 5000 is female. So my ID number would have a number of 5000 or greater.
The next number is the country ID, 0 is South Africa and 1 is not. My ID number with have 0 here.
The second last number used to be a racial identifier but now means nothing.
The last number is a check bit. Which verifies the rest of the number.

So for my ID number it would look something like: 820716[5000-9999]0??

There you go, it’s that easy.

Reading and writing to Excel 2007 or Excel 2010 from C# - Part IV: Putting it together

Robert MacLean

19 August 2009

[Note: See the series index for a list of all parts in this series.]

In part III we looked the interesting part of Excel, Shared Strings, which is just a central store for unique values that the actual spreadsheet cells can map to. Now how do we take that data and combine it with the sheet to get the values?

What makes up a sheet?

First lets look at what a sheet looks like in the package:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2008/2/ac">
<dimension ref="A1:A4" /> 
<sheetViews>
<sheetView tabSelected="1" workbookViewId="0">
<selection activeCell="A5" sqref="A5" /> 
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25" /> 
<sheetData>
<row r="1" spans="1:1" x14ac:dyDescent="0.25">
<c r="A1" t="s">
<v>0</v> 
</c>
</row>
<row r="2" spans="1:1" x14ac:dyDescent="0.25">
<c r="A2" t="s">
<v>1</v> 
</c>
</row>
<row r="3" spans="1:1" x14ac:dyDescent="0.25">
<c r="A3" t="s">
<v>2</v> 
</c>
</row>
<row r="4" spans="1:1" x14ac:dyDescent="0.25">
<c r="A4" t="s">
<v>3</v> 
</c>
</row>
</sheetData>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" /> 
</worksheet>

Well there is a lot to understand in the XML, but for now we care about the <row> (which is the rows in our speadsheet) and within that the cells which the first one looks like:

<c r="A1" t="s">
<v>0</v> 
</c>

First that t=”s” attribute is very important, it tells us the value is stored in the shared strings. Then the index to the shared string is in the v node, in this example it is index 0. It is also important to note the r attribute for both rows and cells contains the position in the sheet.

As an aside what would this look like if we didn’t use shared strings?

<c r="A1">
<v>Some</v> 
</c>

The v node contains the actual value now and we no longer have the t attribute on the c node.

The foundation code for parsing the data

Now that we understand the structure and we have this Dictionary<int,string> which contains the shared strings we can combine them - but first we need a class to store the data in, then we need to get to the right worksheet part and a way to parse the column and row info, once we have that we can parse the data.

Before we read the data, we need a simple class to put the info into:

public class Cell
{
    public Cell(string column, int row, string data)
    {
        this.Column = column;
        this.Row = row;
        this.Data = data;
    }

    public override string ToString()
    {
       return string.Format("{0}:{1} - {2}", Row, Column, Data);
    }

    public string Column { get; set; }
    public int Row { get; set; }
    public string Data { get; set; }
}

How do we find the right worksheet? In the same way as we did get the shared strings in part II.

private static XElement GetWorksheet(int worksheetID, PackagePartCollection allParts)
{
   PackagePart worksheetPart = (from part in allParts
                                 where part.Uri.OriginalString.Equals(String.Format("/xl/worksheets/sheet{0}.xml", worksheetID))
                                 select part).Single();

    return XElement.Load(XmlReader.Create(worksheetPart.GetStream()));
}

How do we know the column and row? Well the c node has that in the r attribute. We’ll pull that data out as part of getting the data, we just need a small helper function which tells us were the column part ends and the row part begins. Thankfully that is easy since rows are always numbers and columns always letters. The function looks like this:

private static int IndexOfNumber(string value)
{
    for (int counter = 0; counter < value.Length; counter++)
    {
        if (char.IsNumber(value[counter]))
        {
            return counter;
        }
    }
    return 0;
}

Finally - we get the data!

We got the worksheet, then we got the cells using LINQ to XML and then we looped over them in a foreach loop. We then got the location from the r attribute - split it into columns and rows using our helper function and then grabbed the index, which we then go to the shared strings object and retrieve the value. The following code puts all those bits together and should go in your main method:

    List<Cell> parsedCells = new List<Cell>();

    XElement worksheetElement = GetWorksheet(1, allParts);

    IEnumerable<XElement> cells = from c in worksheetElement.Descendants(ExcelNamespaces.excelNamespace + "c")
                                  select c;

    foreach (XElement cell in cells)
    {
        string cellPosition = cell.Attribute("r").Value;
        int index = IndexOfNumber(cellPosition);
        string column = cellPosition.Substring(0, index);
        int row = Convert.ToInt32(cellPosition.Substring(index, cellPosition.Length - index));
        int valueIndex = Convert.ToInt32(cell.Descendants(ExcelNamespaces.excelNamespace +  "v").Single().Value);

        parsedCells.Add(new Cell(column, row, sharedStrings[valueIndex]));
    }

And finally we get a list back with all the data in a sheet!

S.A. Architect's new website

Robert MacLean

18 August 2009

Coming this weekend is the S.A. Architect website! So I thought I would drop a quick teaser to explain all the good new features, the bad of the change and the pretty.

The Good

The new site allows members to register without admin intervention.
Ability to use Google Analytics for the states.
We can offer blogs to members very easily now.
Users can use the traditional username & password or OpenId for authentication.
Proper event management system – We can now create an event and have invites sent out, lists of who is coming, how many they are brining etc...
Printing support on every page.
Automatic Google group registration – so when someone signs up to the site they join the Google group too (new members can untick that option).
Looks great in all browsers, no more browser specific options.

The Bad

Every user needs to be migrated, so this weekend all users will be moved and their passwords reset. So you will get an email from the system with the new password.

The Pretty

Below is a sneak peak at what to expect:

Some flair!

Robert MacLean

1 August 2009

In the last week I decided to add some flair to my site, as I am spending a lot of time in various communities be they online, like StackOverflow, or offline, like InformationWorker - so I wanted to add flair for those communities to my website.

However the problem is that I’m in a lot of them and don’t want a page that just scrolls and scrolls so I decided to make a rotating flair block, i.e. it shows a piece of flair for a few seconds and then rotates to another one. Thankfully I already have jQuery setup on my site so this was fairly easy. One thing that caused some headache was getting away from the idea of having a loop, where I’d show one flair then wait then hide it and show the next one. This is a very bad idea because it means that it runs forever which is what I want - but not an endless loop because browsers will detect that and stop the script. Also from a performance point of view JavaScript in a loop tends to make a browser run slowly.

The solution is to use events and kick them off in a staggered fashion - thankfully JavaScript natively has a function for that: setTimeout which takes a string which it will execute and an integer which is the milliseconds delay to wait for. Then on it’s turn show it, wait (using setTimeout again), then hide it and lastly wait again to show it. Because that cycle is the same for each item the staggering ensures that they do not overlap and you get a nice, smooth flowing and non-loop loop :)

The technical bits

My HTML is made up of a lot of divs - each one for a flair:

<div class="flair-badge">
    <div class="flair-title">
        <a class="flair-title" href="http://www.stackoverflow.com">StackOverflow.com</a></div>
    <iframe src="http://stackoverflow.com/users/flair/53236.html" marginwidth="0" marginheight="0" frameborder="0" scrolling="no" width="210px" height="60px"></iframe>
</div>

A dash of CSS for the styling, most importantly hiding all of them initially.

And the JavaScript:

var interval = 5000;

$(document).ready(function() {
    var badges = $(".flair-badge").length;
    var counter = 0;
    for (var counter = 0; counter<badges; counter++) {
        setTimeout('BadgeRotate(' + counter + ',' + badges + ')', counter * interval);
    }
});

function BadgeRotate(badge, badgeCount) {
    $(".flair-badge:nth(" + badge + ")").fadeIn("slow");
    setTimeout('BadgeRotateEnd(' + badge + ',' + badgeCount + ')', interval);
}

function BadgeRotateEnd(badge, badgeCount) {
    $(".flair-badge:nth(" + badge + ")").hide();
    setTimeout('BadgeRotate(' + badge + ',' + badgeCount + ')', (badgeCount * interval) - interval);
}

Reading and writing to Excel 2007 or Excel 2010 from C# - Part III: Shared Strings

Robert MacLean

29 July 2009

[Note: See the series index for a list of all parts in this series.]

Excel’s file format is an interesting one compared to the rest of the Office Suite in that it can store data in two places where most others store the data in a single place. The reason Excel supports this is for good performance while keeping the size of the file small. To illustrate the scenario lets pretend we had a single sheet with some info in it:

Now for each cell we need to process the value and the total size would be 32 characters of data. However with a shared strings model we get something that looks like this:

The result is the same however we are processing values once and the size is less, in this example 24 characters.

The Excel format is pliable, in that it will let you do either way. Note the Excel client will always use the shared strings method, so for reading you should support it. This brings up an interesting scenario, say you are filling a spreadsheet using direct input and then you open it in Excel, what happens? Well Excel identifies the structure, remaps it automatically and then when the user wishes to close (regardless if they have made a change or not) will prompt them to save the file.

The element we loaded at the end of part 2 is that shared strings file, which in the archive is \xl\sharedstrings.xml. If we look at it, it looks something similar to this:



  
    Some
  
  
    Data
  
  
    Belongs
  
  
    Here

Each <t> node is a value and it corresponds to a value in the sheet which we will parse later. The sheet will have a value in it, which is the key to the item in the share string. The key is an zero based index. So in the above example the first <t> node (Some) will be stored as 0, the second (Data) will be 1 and so on. The code to parse it which I wrote looks like this:

private static void ParseSharedStrings(XElement SharedStringsElement, Dictionary<int, string>sharedStrings)
{
    IEnumerable<XElement> sharedStringsElements = from s in SharedStringsElement.Descendants(ExcelNamespaces.excelNamespace + "t")
                                                  select s;

    int Counter = 0;
    foreach (XElement sharedString in sharedStringsElements)
    {
        sharedStrings.Add(Counter, sharedString.Value);
        Counter++;
    }
}

Using this I am parsing the node and putting the results into a Dictionary<int,string>.

Reading and Writing to Excel 2007 or Excel 2010 from C# - Part II: Basics

Robert MacLean

25 July 2009

[Note: See the series index for a list of all parts in this series.]

To get support for the technologies we will use in this we need to add a few assembly references to our solution:

WindowsBase.dll
System.Xml
System.Xml.Linq
System.Core

Next make sure you have the following namespaces added to your using/imports:

System.IO.Packaging: This provides the functionality to open the files.
System.Xml
System.Xml.Linq
System.Linq
System.IO

Right next there is a XML namespace (not to be confused with .NET code name spaces) we need to use for most of our queries: http://schemas.openxmlformats.org/spreadsheetml/2006/main and there is a second one we will use seldom http://schemas.openxmlformats.org/officeDocument/2006/relationships. So I dumped this into a nice static class as follows:

namespace XlsxWriter
{
    using System.Xml.Linq;

    internal static class ExcelNamespaces
    {
        internal static XNamespace excelNamespace = XNamespace.Get("http://schemas.openxmlformats.org/spreadsheetml/2006/main");
        internal static XNamespace excelRelationshipsNamepace = XNamespace.Get("http://schemas.openxmlformats.org/officeDocument/2006/relationships");
    }
}

Next we need to create an instance of the System.IO.Packaging.Package class (from WindowsBase.dll) and instantiate it by calling the static method Open.

 Package xlsxPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite);

Note: It is at this point that the file is opened, this is important since Excel will LOCK an open file. This is an important issue to be aware of because when you open a file that is locked a lovely exception is thrown. To correct that you must make sure to call the close method on the package, for example:

xlsxPackage.Close();

When you open the XLSX file manually, the first file you’ll see is the [Content_Types].xml file which is a manifest of all the files in the ZIP archive. What is nice with using Packaging is that you can call the GetParts method to get a collection of Parts, which are actually just the files within the XLSX file.

The contents of the XLSX if renamed to a ZIP file and opened.

The various files listed in the [Content_Types].xml file.

What we will use during this is the ContentType parameter to filter the parts to the specific item we want to work with. The second image above to identify the value for the ContentType. For example the ContentType for a worksheet is: application/vnd.openxmlformats-officedocument.speadsheetml.worksheet+xml.

Once we have all the parts of the XLSX file we can navigate through it to get the bits we need to read the content, which involves two steps:

Finding the shared strings part. This is another XML file which allows for strings of values to shared between worksheets. This is optional for writing, to use but does save space and speed up loading. For reading values it is required as Excel will use it.
Finding the worksheet that we want to read from, this is a separate part from the shared strings.

Lets start with reading the shared strings part, this will be basis for reading any part later in series. What we need to do is get the first PackagePart with the type: application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml

PackagePart sharedStringsPart = (from part in allParts
    where part.ContentType.Equals("application/vnd.openxmlformats-officedocument.spreadsheetml.sharedStrings+xml")
    select part).Single();

Now we need to get the XML content out of the PackagePart, which is easy with the GetStream method, which we load into an XmlReader so that it can be loaded into a XElement. This is a bit convoluted but it is just one line to get it from one type to another and the benefits of using LINQ to XML are worth it:

XElement sharedStringsElement = XElement.Load(XmlReader.Create(sharedStringsPart.GetStream()));

Now we have the ability to work with the XElement and do some real work. In the next parts, we’ll look at what we can do with it and how to get from a single part to an actual sheet.

Gallery2 + C# - Beta 2 Available

Robert MacLean

25 July 2009

A few weeks back I posted beta 2 of the gallery 2 .net toolkit where I have done considerable more work on it than I ever expected I would. Lots of need bits of code and features available. What’s in it now:

There are four items currently available:

(Tool) For people just wanting to export all their images out of Gallery2, there is g2Export which is a command line tool to export images.
(Tool) For people wanting to get information out of Gallery2 into a sane format, there is g2 Album Management which is an Excel 2007 add-in to export information about albums and images to Excel.
(API) For developers wanting to write their own tools or integrations, there is the SADev.Gallery2.Protocol which wraps the Gallery2 remote API. Please see the What you should know? page for information on using the API.
(Source) Lastly for developers needing some help, there is the source code for the the g2 Export Tool and the g2 Album Management Excel Add-in

Here is a screen shot of g2 Album Management in action:

Here is a screen shot of g2Export in action:

If you are interested in how much of the Gallery2 API is catered for, it’s most of it (the file upload parts are the only major outstanding ones). The key thing to note on the table is the tested column. While the code is written, it may not be tested and may not work at all. I have found the documentation is not 100% in line with the actual gallery2 code so something it needs considerable rework for it to actually work.

API Call	Basic Request	Basic Response	Tested	Advanced Request	Advanced Response
login	done	done	done	done	done
fetch-albums	done	done	done	done	done
fetch-albums-prune	done	done	done	done	done
add-item (upload)		done			done
add-item (url)	done	done		done	done
album-properties	done	done	done	done	done
new-album	done	done		done	done
fetch-album-images	done	done	done	done	done
move-album	done	done		done	done
increment-view-count	done	done		done	done
image-properties	done	done	done	done	done
no-op	done	done	done	done	done

Proven Source Control Practises Poster

Robert MacLean

25 July 2009

Maybe one of the toughest things in software development to get right all the time: source control. Well now with this nice bright A3 poster printed on your wall (or maybe above the monitor of the guy who breaks the builds daily) you’ll never go wrong again.

It covers 17 proven practises broken into 5 key areas:

Things YOU should do

Keep up to date
Be light and quick with checkouts
Don’t check in unneeded binaries
Working folders should be disposable
Use undo/revert sparingly

Branching

Plan your branching
Own the merge
Look after branches

Management

Useful & meaningful check in messages
Don’t use the audit trial for blame

Repository

Don’t break the build
Separate your repo
Don’t forget to shelve
Use labels

Technology

Try concurrent access
Don’t be afraid of branching concepts
Automerge for checkout only

Direct Download

www.drp.co.za

Subscribe to

Search

The Structure

The Build Script

The Numbering

What makes up a sheet?

The foundation code for parsing the data

Finally - we get the data!

The Good

The Bad

The Pretty

The technical bits