This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.
This is the first in the data structure reviews and likely the simplest; the humble array. The first issue is the term Array - it term differs depending on who uses it but we will get to that a bit later.
Generally I think of an array like this:
An array is a container object that holds a fixed number of values of a single type. The length of an array is established when the array is created. After creation, its length is fixed. – Oracle Java Documentation
Seems simple enough. There are two limits placed on our container: single type & fixed length and both relate to how the array is handled in memory. When an array is created it looks at the type & length and uses that to calculate how much memory is needed to store all of that. For example if we had an array of 8 items we would get a block of memory allocated for the array like this:
In some systems arrays can just grow but allocating more memory at the end, these are called dynamic arrays. However many systems do not allow this because the way memory is handled is there might not be any space after the last item to grow into, thus the array length is fixed as there isn’t any memory allocated for that array instance.
This has a major the advantage to read performance, since I can quickly calculate the where the item will be in memory – thus skipping having to read/navigate all other items. For example:
If my arrays values start at position 100 in memory and I want the 4th item in an int, it would be 4 (for the position) multiplied by 4 (for the int size) + 100 (for the start address) & boom value!
This makes every read an O(1) operation!
What happens when we can’t know the size of the items in the array, for example if we created an object which can hold anything?
In this scenario, when the array is created, rather than allocating memory based on length multiplied by type size, it allocates length multiplied by the size of a pointer and rather than storing the values themselves in the array memory, it stores pointers to other locations in memory where the value is.
Obviously this has a slightly worse performance than an array where we can have the values in it – but it is slight. Below is some output from BenchmarkDotNet comparing sequential reads of an int vs. object (code here) and it is close:
Method | Median | StdDev |
--------------------------- |----------- |---------- |
IntArraySequentialReads | 52.2905 us | 4.9374 us |
ObjectArraySequentialReads | 58.3718 us | 5.4106 us |
Basically these all have a key and a value associated to them and you can lookup the value by using the key. Implementation details differ though from platform to platform.
Associative arrays do take up more memory than a traditional array (good example here of PHP where it was 18 times larger).
Multi-dimensional arrays also differ platform to platform. The Java version of it is an array of arrays, which achieves the same goal is basically implemented the same as as object was described above. In C# these are known as jagged arrays.
C# and other languages have proper multi-dimensional arrays which work differently – they actually take all the dimensions, multiply them together and use that for the length of an array. The dimensions just give different offsets.
Jagged arrays do have one benefit over a multi-dimensional array, since each internal array is independent, they can be different sizes where a multi-dimensional array all the dimensions must be the same size.
C# – List<T>
If you are working in C#, you might be asking yourself what List<T> is and how it relates to Array since it can grow forever! List<T> is just an array with initial size of 4! When you call .Add to add a 5th item, it then does the following:
- Create second array of where the length is double the current array
- Copy all items from first array to second array
- Use second array now
This is SUPER expensive and also why there is an optional constructor where you can override the initial size which helps a lot. Once again using BenchmarkDotNet you can see that it makes a nice difference (code):
Method | Median | StdDev |
------------------------ |------------ |----------- |
DefaultConstructorUsed | 701.7312 us | 38.5573 us |
ConstructorWithSizeUsed | 548.5436 us | 13.1122 us |