23 May 2016

SFTPK: Array

This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.

Array

This is the first in the data structure reviews and likely the simplest; the humble array. The first issue is the term Array -  it term differs depending on who uses it Sad smile but we will get to that a bit later.

Generally I think of an array like this:

An array is a container object that holds a fixed number of values of a single type. The length of an array is established when the array is created. After creation, its length is fixed. Oracle Java Documentation

Seems simple enough. There are two limits placed on our container: single type & fixed length and both relate to how the array is handled in memory. When an array is created it looks at the type & length and uses that to calculate how much memory is needed to store all of that. For example if we had an array of 8 items we would get a block of memory allocated for the array like this:

image

In some systems arrays can just grow but allocating more memory at the end, these are called dynamic arrays. However many systems do not allow this because the way memory is handled is there might not be any space after the last item to grow into, thus the array length is fixed as there isn’t any memory allocated for that array instance.

This has a major the advantage to read performance, since I can quickly calculate the where the item will be in memory – thus skipping having to read/navigate all other items. For example:

If my arrays values start at position 100 in memory and I want the 4th item in an int[], it would be 4 (for the position) multiplied by 4 (for the int size) + 100 (for the start address) & boom value!

This makes every read an O(1) operation!

Object[]

What happens when we can’t know the size of the items in the array, for example if we created an object[] which can hold anything?

In this scenario, when the array is created, rather than allocating memory based on length multiplied by type size, it allocates length multiplied by the size of a pointer and rather than storing the values themselves in the array memory, it stores pointers to other locations in memory where the value is.

Obviously this has a slightly worse performance than an array where we can have the values in it – but it is slight. Below is some output from BenchmarkDotNet comparing sequential reads of an int[] vs. object[] (code here) and it is close:

                     Method |     Median |    StdDev |
--------------------------- |----------- |---------- |
    IntArraySequentialReads | 52.2905 us | 4.9374 us |
ObjectArraySequentialReads | 58.3718 us | 5.4106 us |

Associative Arrays/Dictionary

As mentioned above, not every array is an array – some languages (PHP & JavaScript for example) do not allocate a block of memory like described above. These languages use what is called an associative array, also known as a map (PHP likes to refer to it this way) or a dictionary.

Basically these all have a key and a value associated to them and you can lookup the value by using the key. Implementation details differ though from platform to platform.

For example on C#, Dictionary<TKey,TValue> it is handled with an array under the covers, however in JavaScript it is a normal object. When an item is added to the array in JavaScript, it merely adds a new property to the object and that property is the index in the array.

Associative arrays do take up more memory than a traditional array (good example here of PHP where it was 18 times larger).

Multi-dimensional arrays

Multi-dimensional arrays also differ platform to platform. The Java version of it is an array of arrays, which achieves the same goal is basically implemented the same as as object[] was described above. In C# these are known as jagged arrays.

C# and other languages have proper multi-dimensional arrays which work differently – they actually take all the dimensions, multiply them together and use that for the length of an array. The dimensions just give different offsets.

Example:

image

Jagged arrays do have one benefit over a multi-dimensional array, since each internal array is independent, they can be different sizes where a multi-dimensional array all the dimensions must be the same size.

C# – List<T>

If you are working in C#, you might be asking yourself what List<T> is and how it relates to Array since it can grow forever! List<T> is just an array with initial size of 4! When you call .Add to add a 5th item, it then does the following:

  1. Create second array of where the length is double the current array
  2. Copy all items from first array to second array
  3. Use second array now

This is SUPER expensive and also why there is an optional constructor where you can override the initial size which helps a lot. Once again using BenchmarkDotNet you can see that it makes a nice difference (code):

                  Method |      Median |     StdDev |
------------------------ |------------ |----------- |
  DefaultConstructorUsed | 701.7312 us | 38.5573 us |
ConstructorWithSizeUsed | 548.5436 us | 13.1122 us |

JavaScript Arrays

As mentioned above, the standard JavaScript array is an associative array. However, JavaScript (from ES5) does contain support for typed arrays. The supported methods differ so this isn’t an easy replacement and it only supports a limited number numeric of types. Might make a lot of sense to use these from a performance reason since they are implemented as actual arrays by the JavaScript runtimes that support it.

20 May 2016

GitHub vs. VSTS Pricing, in more than 140 characters

GitHub has introduced a flat rate structure for unlimited private repos and I wanted to understand how it compares to the Visual Studio Team Services (VSTS – previously Visual Studio Online (VSO)) pricing where you get that already. I drew up a quick picture and tweeted it:

I have had mostly positive feedback for it, however there has been some confusion in it.

Date

Yes, it says 2017. I’m too lazy to change that to 2016, really. If it bugs you, just look away. Or pretend I’m a time traveler.

VSTS is cheaper yet more confusing

The title is my summary for the pricing difference and people have interpreted that to mean so many things. Including that I mean VSTS is a more confusing platform and ignoring the fact this is about price. I only meant the pricing is confusing. For example here is the math for GitHub vs. VSTS at 10 users:

GitHub VSTS
Price $70 $30
Math ((10-5)*9)+25 (10-5)*6

At this point it seems simple – GitHub is $25 for the first five users, so we subtract 5 from the total number of users and multiply by 9 for the remaining and add that to the $25 for the first five users. VSTS is even easier. Your first five are free, so we subtract those from the total and multiply the remaining by 6 which is the price for that tier.

The problem is VSTS is a tiered pricing, where GitHub is a fixed pricing. At 1500 users the math for GitHub remains the same but VSTS is way more complex.

GitHub VSTS
Price $13480 $5350
Math ((1500-5)*9)+25 (5*6)+(90*8)+(900*4)+(500*2)

You’ll note the VSTS math is way different. First I’m not even bothering with subtract 5 for free, so the total users is 1495. The first five are charged at $6 a month, the next 90 at $8 a month, the next 900 at $4 and the remaining 500 users are charged at $2. Once added up you get the total.

And it gets more complex, because if you have an EA (Enterprise Agreement – something your company signs with Microsoft to pay differently & pay less for licensing), then none of that applies – it is a flat $4 per user.

GitHub is also easier in user types – there is one. In VSTS there is three (note, these are my names for the user types – not official):

  • Dev: This is the paid ones we have been talking about.
  • MSDN: This is the same, except they have a TFS on-premise CAL (i.e. you have a user license for local TFS) or they have a MSDN subscription which includes VSTS.
  • Stake Holder: These are free – but really are about work item management only. This is what you give your customer who needs to prioritize the backlog but doesn’t need code or build access.

How would these types impact the cost? Lets see an example

Example

Let us pretend that we have a dev team of 40 people, split into 5 feature teams of 1x PM, 1x Tester, 6 x devs. In each feature team 2 of the devs are outside consultants and the tester & PM do not have MSDN cause the company only has MSDN for devs. Your gut might be you need 40 licenses, so $270 according the calculator. The reality is you won’t pay for the 5 PMs as they use stake holder licenses. You get 5 free licenses which you assign to your testers. Your 20 devs have MSDN so they don’t need anything extra. That just means the 10 consultants need licenses – so the price is $70 not $270 i.e. (5*6)+(5*8).

For GitHub, that would be $315 per month i.e. (40-5)*9.

Platform Confusion

To answer the trolls about is VSTS a more confusing platform? If you coming from GitHub, yes I think it might be more confusing as VSTS offers more, there is more to learn and it will be a bit off from what you know. The core, Git repos, remains the same. If you can learn Git, you can learn VSTS so in the medium term it is not more confusing at all.

13 May 2016

SFTPN: Big O Notation

The series post, which contains more stuff formally trained programmers know, can be found here.

Big O Notation

This one has had me always confused and always seemed to be something out of my reach. It really is simple once I actually sat down and worked through it. Lets start with the syntax:

O(n)

The "O" just is a indicator that we using big O notation and the n is the cost. Cost could mean a variety of things, memory, cpu cycles but mostly people think of it as the number of times the code will execute. The best cost would be code that never runs (i.e. `O(0)`) but that likely has no value.
To help explain it, let's look at a simple example:

Console.WriteLine("Hello 1");

The cost for that is 1, so we could write `O(1)`. If we put that in a for loop like this:

for (var counter = 0; counter < 10; counter++) 
{ 
    Console.WriteLine("Hello "+counter); 
} 


The cost would be 10, so we could write `O(10)`.

n

Rather than having to be explicit with number (like 10 above) we can use a short hand notation. The common one is ```n``` which means it will run once per item. For our for loop example about that means it could be written as `O(n)` so that regardless if we looping 10 times or a 100 times the relative cost is the same and can be referenced the same. From this point on it really is just about adding math to it.

If we were to have a loop inside a loop as follows, which will run 100 times (10 X 10) we could write this as `O(n<sup>2</sup>)`.

var n = 10; 
for (var outerCounter = 0; outerCounter<n; outerCounter++) 
{ 
    for (var counter = 0; counter < n; counter++) 
    { 
        Console.WriteLine("Hello "+counter); 
    } 
} 

The other common one used with Big O Notation is `log`, i.e. Logarithm, which could be written like this: O(log n). In this case the cost per item gets less (relative to earlier items) as we add more items.

bigO_thumb5

Further reading

The best guide I found was from Rob Bell.

12 May 2016

Stuff formally trained programmers know

This is going to be a series of posts where I intend to dive into the stuff which "formally trained" programmers seem to know.   

What do I mean by "formally trained"?

The easy way to think of it is programmers who have a university education, or similar, where the focus on theory matters a lot. It also feels to me that the old & wise men of programming all just know this and the upcoming generation doesn't seem to have this knowledge. I don't put myself in that group of formally trained, and even after 20 years I don't know these things well enough to hold a conversation about them.

What will I be covering? (these will be linked as the posts go up)

Language

The biggest pain for me in 20 years of programming is not everyone speaks the same language. I am not referring to C# or JavaScript, rather terminology that we use. Is an Array always an Array? How do we talk about measuring performance?

Big O Notation

Data Structures

The way we structure data, the advantages and disadvantages of each.

Array

Linked List

Binary Tree

Binary Search Tree

Red/Black Tree

Heap

Hash Table

Stack

Queue

Trie

Graph (directed or undirected)

Algorithms

Algorithms are ways of working with data and data structures in a consistent way. The advantage to knowing them is two-fold; First it helps communication since we can all use the same names and secondly it expands our thinking about programming.

Bubble Sort

Merge Sort

Quick Sort

Radix Sort

Depth First Search

Breadth First Search

Shunting Yard

11 May 2016

Losing weight, the developer way

August 2015, I decided I need to lose some weight and get healthier. I wasn’t happy with my image and I wasn’t happy that my son kicks my ass in soccer when I get tired after 10 minutes. At that point I was about 105kg – today I am 71kg Smile

I’ve had a few people ask how I did it, so here is the steps I took and why I went this route rather than a specific diet.

My scrum board

I started off by just tracking what I ate. I had a Windows Phone at the time and found some apps for it. They were pretty quality. What I found useful is to have rough data, use it to compare different food, help understand my choices and that the data should correlate day to day, so you get a feel that today is a good day or a bad day in comparison to history. I moved to Samsung Galaxy S7 which comes with S Health and that is what I use now, it is way better than any Windows Phone option.

Each and every meal was tracked just so I could understand how much is going into me, where the calories, carbs etc.. are coming from and to start to have a way to improve.

Iterative improvement

The next step happened naturally – I started picking what I ate differently because I had more knowledge. My portions also got smaller. This really brought me down to about 90kg just with making smarter eating choices.

I am also not shying away from certain foods, to me there are no bad foods. There are bad amounts and what that amount is is based on the food and varies person to person. You can’t take what works for you and assume it will work for others.

That said though, personally I feel better now that I have lowered the following foods in my diet:

  • Wheat
  • Dairy
  • Sugary drinks

The difference is not to say no, so you end up just thinking of them rather it is taking one piece of toast instead of two cause it makes me feel healthier. This applies to cheat days too, they are kinda smart at helping with willpower management but they just don’t work for me. If I want cake – I eat cake… I just need to work it into the plan for the day.

Remove technical debt

Next was cleaning up my house. No snack foods. No sugary drinks. *sigh*

This was hard but my willpower at 11pm is low. I know my weaknesses so I remove the issues when I am strong so I don’t make mistakes when I am weak.

Cycles

This isn’t just about weight loss, although that is what I have covered so far, it equally is about fitness too. For me, it meant starting to cycle again. This brought up my fitness and helped with about 20kg of weight loss too. It isn’t easy, but it is essential for me. As McDonalds reminds you – it is what you eat and what you do.

Diets don’t work

The problem with diets is that as you lose weight you body needs less calories yet your mind and life don’t change, which leads to fast drops and fast gains. Willpower is hard to do all the time, yet I find when you have data to assist you, it isn’t so much about raw willpower as thinking and I find that is easier. I am approaching this like I would a dev project – learn, implement, review, improve (LIRI).

02 Mar 2016

DevConf - Survival Guide

One role I have often had in companies is assisting teams to get ready for a conference, and with DevConf being next week & my team attending I needed to build a survival guide for them. If you are attending DevConf, then this guide may help you as well!

01 Mar 2016

Configuring Open Live Writer with Drupal 7

logoI’ve been using Drupal for 9 years 1 month for this blog and it has served me really well, except working with Windows Live Writer. Every time I reinstalled Windows I had to go through the stupid jumps to get it working again, but thankfully people had documented the process so it was never an issue.

With the introduction of Open Live Writer it changed AGAIN, so this is a guide for you (& me for my next reinstall) on how to configure it.

Drupal

On the Drupal side you need to install the BlogAPI module: https://www.drupal.org/project/blogapi 

This provides a bunch of additional features that are needed for it to work.

Make sure the BlogAPI is configured to use the MetaWeblog mode.

Open Live Writer

This is a lot simplier once Drupal is setup

  1. When adding a blog, select "Other services"
  2. Set your web address to be similar to this:
    http://EXAMPLE.COM/blog/1
    < The important part here is /blog/1. Blog should refer to the content type name & 1 should refer to the blog ID.
  3. OLW won't auto detect Drupal, so you need to choose select Movable Type API from the list of options.
  4. Next, set the remote posting URL to:
    http://EXAMPLE.COM/blogapi/xmlrpc
  5. And move on to finishing the process. A word of note here, it might work getting the theme for the blog yet, this largely depends on your Drupal config & other modules.

Fix the theme loading

I have Taxonomy setup on my blog and it is a required field. The test post that was done to detect the theme would post with a category ‘Uncategorized’, which I did not have.

The second step was once I had setup the blog in Open Live Writer, I had to make a registry change to:

HKEY_CURRENT_USER\SOFTWARE\OpenLiveWriter\Weblogs\<LONG NUMBER>

Inside there is a key called ‘HomepageUrl’ and I had to change that to be where the blog could be found. In my case it pointed to

http://EXAMPLE.COM/users/username
and changed it to just
http://EXAMPLE.COM
.
28 Jan 2016

Join us at DevConf!

Do you miss TechEd? Do you miss a big conference where the passage conversations with the best presenters in the country and from around the world can happen? Do you miss having too many choices for topics to attend cause they all sound great?

I do. The conference space in SA has shifted a lot in the last few years with niche events happening, but very little broad events focused on networking, skilling up and the challenges faced by the modern developer in South Africa who must wear multiple hats. Together with the Developer User Group we are join to fix that!

Come 8th March, in Johannesburg, we will have a new full day conference called DevConf! It has multiple tracks jammed full of content for you including talks convering programming techniques, tools & frameworks, databases, DevOps and the softer skill stuff (like dealing with teams). The event has over 40 speakers including the best from South Africa and internationally. Personally I am so excited to see Willy-Peter Schaub from Microsoft in Canada come out to share about how they use Agile!

Tickets are on sale right now and all the details can be found at http://www.devconf.co.za

Tags: 

Pages