Quick tip: Grep in Git

Submitted by Robert MacLean on Wed, 06/06/2018 - 09:00

This quick tip is about two small features of Git I wish I had known about earlier as it makes it way easier to do searching through it.

git-grep

git-grep is a way to search through your tracked files for whatever you provide. For example, if we want all files with the word index in it: git grep index

Demo of git grep

We can limit to specific files, for example, if we want to filter the above example to just JSON files: git grep index -- '*.json' Demo of git grep with filter

We can search for multiple items in a single file, for example, if we want to find all files with index and model in it: git grep --all-match -e index -e model Demo of git grep with multiple filters

git-log grep

git-log has a grep function too which is awesome for finding commit messages with a specific word or words in it. For example, if I want to find all commits about Speakers for DevConf I could do: git log --all --grep "Speaker"

Git log grep example

Drupal Geshi Cheatsheet

Submitted by Robert MacLean on Tue, 06/05/2018 - 14:16

Since redoing this blog, I switched out the syntax highlighting to use the Drupal Geshi Module.

For the love of everything I can't remember the tricks for using it, so here is a cheatsheet; mostly for myself but maybe you get value too. These are all HTML attributes you add can to your code block.

  • language this controls the language for rendering.
  • line_numbering controls if line numbering is off, on or fancy with the values off, normal and fancy respectively.
    • With fancy line numbers you can use the attribute interval to control how often to show the line numbers.
  • title adds a title to the code block.
  • special_lines takes a comma-separated list of numbers and highlights them.
  • linenumbers_start controls what the first line number is.

Information worked out from this code.

Quick tip: The handy command line calculator

Submitted by Robert MacLean on Tue, 06/05/2018 - 09:00

Who needs a GUI to do math when we have options for Unix like (MacOS, Linux etc...), Command Prompt, and PowerShell?

Unix like OSs

Unix OSs, including MacOS, WSL & Linux, include an awesome calculator called BC. From the man page:

bc is a language that supports arbitrary precision numbers with interactive execution of statements. There are some similarities in the syntax to the C programming language. A standard math library is available by command line option. If requested, the math library is defined before processing any files. bc starts by processing code from all the files listed on the command line in the order listed. After all the files have been processed, bc reads from the standard input. All code is executed as it is read.

The only cavet to use, is the file input; you can't just pass in parameters... but you can use echo to pass in the equation. For example:

> echo '1 + 2 + 3 + 4' | bc
> 10

bc can also work with different number bases, for example:

> echo "obase=2; ibase=10; 42" | bc
> 101010

obase stands for output base & ibase stands for input base. So in the example, we are turning 42 (base 10) to binary.

Floating point division is a weirdness with bc. For example, you would expect the answer to be 0.4 below but it is 0:

> echo "2/5" | bc
> 0

The solution is to use the math library switch -l:

> echo "2/5" | bc -l
> .40000000000000000000

and if 20 point position, you can use scale to control it:

> echo "scale=3; 2/5" | bc -l
> .400

Windows Command Prompt

Command prompt has a similar tool with the set command.

>set /a 3+3
6
>set /a (3+3)*3
18
>set /a "203>>3"
25

PowerShell

PowerShell natively supports some basic functionality, but if you want to use more advanced functionality you can use the entire System.Math class to do a lot of functionality.

> 4+5
9
> 6*7
42
> [Math]::Sin(50)
-0,262374853703929
> [Math]::Max([Math]::Tan(40), [Math]::Cos(40))
-0,666938061652262

Quick tip, the Say command

Submitted by Robert MacLean on Mon, 06/04/2018 - 09:00

MacOS has a great tool, called say which just says what you pass it. For example say "Hello" and next you'll hear your device say Hello.

Where this is really useful is when you want to do a long running action and get notified when it is done. For example git clone https://github.com/torvalds/linux.git && say "clone complete"


So, what about for Windows? You can do something similar with Powershell. First the setup:

Add-Type -AssemblyName System.speech
$say = New-Object System.Speech.Synthesis.SpeechSynthesizer

Once you have that in place you can use it like this: git clone https://github.com/torvalds/linux.git; $say.Speak("clone complete")

VSCode - Too many open files

Submitted by Robert MacLean on Thu, 05/31/2018 - 22:56

If you are getting the too many open files error with MacOS it could be VSCode trying to too many open files (or by default opening more than 10240 files).

You can confirm that with the following:

lsof |  awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20

So, what can you do about it? If the files are not important, say it is your output folder, then you can use VSCode settings to exclude them. In the example below, I configure VSCode to ignore build folders. I would encourage this as a workspace setting, so everyone in the team gets it.

  1. // Configure glob patterns for excluding files and folders.
  2.   "files.exclude": {
  3.     "**/build": true
  4.   },
  5.  
  6.   "files.watcherExclude": {
  7.     "**/build": true
  8.   },
  9.  
  10.   "search.exclude": {
  11.     "**/build": true
  12.   },

MacOS utilities I find useful

Submitted by Robert MacLean on Thu, 05/31/2018 - 12:20

After using a MacBook Pro for two years I thought it was time to share what utilities I found really useful to have. These are obviously weighted towards being a software developer, so your mileage might vary.

Brew

It is the missing package manager for MacOS, so as with NPM, Chocolatey, or Composer, where you can install what you need via the command line.

It may seem weird, like what is wrong with just download and install what you need?! The advantage is that you can write this stuff down so that if you need to reinstall it is easier (and also easier to share to help others get up and running).

A second advantage is updating, it takes one command to update all the tools I use.

More info

VSCode

More than an IDE, this is my go-to tool for anything text; Editing config ✅taking notes ✅anything really.

Install with Brew: brew install homebrew/cask/visual-studio-code

An important tweak for VSCode is to make sure it is launching from the Terminal, thankfully it is really easy.

More info

Aerial

The AppleTV has the best screensaver I've ever seen, and some smart person ported it to MacOS with the name Aerial.

A word of warning, these videos are massive and will destroy your bandwidth. One tip to solve that is that under the settings is a Cache section - make sure you have the Cache Aerials As They Play checked else this will help. If you are on uncapped, then there is also a download now option which is a must to use.

Screen shot of screensaver settings

Install with Brew: brew install caskroom/cask/aerial

More info

Fish

Bash is nice, Fish is nicer. It just feels like what you expect in a modern world.

Install with Brew: brew install fish

More info

Fish Node Manager

Part of my job has involved working with multiple projects, and that means multiple versions of Node, and that was a pain. Thankfully there is a Node Manager for Fish that lets you easily change what version of Node you are using.

Unfortunately, this isn't as easy to setup, as to install it you first need Fisherman, which is like Brew but for Fish; which leads to this 3 step process to install it and configure it.

curl -Lo ~/.config/fish/functions/fisher.fish --create-dirs https://git.io/fisher
fisher fnm
fnm use latest

More info

Amphetamine

Amphetamine is a massively useful tool for MacOS, especially in a DevOps culture where you might get up in the night and just need your machine to behave the exact way you want it. Its core use is to not let your Mac go to sleep and you can control what triggers that, automatically or manually.

Get it from the Store

Status Clock

Another very useful tool is Status Clock which can show a second time on the menu which is exceptionally useful if you need to work across countries.

Get it from the Store

Settings Tweaks

Beyond useful tools, there are some useful tweaks to the standard MacOS settings:

SFTPK: Binary Tree

Submitted by Robert MacLean on Wed, 07/20/2016 - 18:25

This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.

Binary Tree

In the previous post we looked at the tree pattern, which is a theoretical way of structuring data with many advantages. A tree is just a theory though, so what does an actual implementation of it look like? A common data structure implementation is a binary tree.

The name binary tree gives us a hint to how it is structured, each node can have at most 2 child nodes.

Example of annotated binary tree

Classifications

As a binary tree has some flexibility in it, a number classifications have come up to have a consistent way to discuss a binary tree. Common classifications are: - Full binary tree: Each node in a binary tree can have zero, one or two child nodes. In a full binary tree each node can only have zero or two child nodes. - Perfect binary tree: This is a full binary tree with the additional condition that all leaf nodes (i.e. nodes with no children) are at the same level/depth. - Complete binary tree: The complete binary tree is where each leaf node is as far left as possible. - Balanced binary tree: A balanced binary tree is a tree where the height of the tree is as small a number as possible.

Implementations

While a binary tree is more than just a pattern, there are no out of the box implementations in C#, Java or JavaScript for it. The reason is that it is a very simple data structure and so if you need just the data structure you could implement it yourself but more importantly, you likely want more than the simple structure - you want a structure that optimises for traversal or data management.

References

Wikipedia: Binary Tree

SFTPK: Tree

Submitted by Robert MacLean on Sat, 06/04/2016 - 12:28

This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.

Trees

This post will look at the mighty tree, which is more a pattern than a specific data structure. The reason to understand the pattern is that so many of the data structures we will look at in the future use it that a good understanding of it provides a strong basis to work from.

As a computer user though, you already have seen and used a tree structure - you may have just not known it. The most common form of it is the file system, where you have a root (i.e. / or C:\) and that has various folders under it. Each folder itself can have folders, until you end at an empty folder or a file.

File system

This is the way a tree structure works too: you start with a root, then move to nodes and finally end with leaves.

Generic Tree

In the basic concept of a tree there are no rules on the nodes and the values they contain, so a node may contain zero, one, two, three or a hundred other nodes.

What makes a tree really powerful, is that it really is a collection of trees. i.e. if you take any node it is in itself a tree and so the algorithms used to work with a tree work with each node too. This enables you to work with a powerful computer science concept, recursion.

Recursion

Recursion is a concept that lacks a real world equivalent and so can be difficult to grasp initially. At its simplest for these posts, it is a method or function which calls itself, until instructed to stop. For example, you might write a function called getFiles which takes in a path to a folder and returns an array of filenames. Inside getFiles it loops over all the files in the folder and adds them to a variable to return. Then it loops over all the folders in that folder and for each folder it finds, it calls getFiles again.

function getFiles(path){
    var result = [];
    fs.readdirSync(path).each(file => result.push(file)); // get all files using Node, and push them to the result array.
    var directories = fs.getDirectoriesSync(path); // not real node call - for example.
    directories.each(directory =>{
        var files = getFiles(directory); // calling itself.
        files.each(file => result.push(file));
    });

    return result;
}
function IEnumerable<string> GetAllFiles(string path) // changed to GetAllFiles so it doesn't get too confusing with the built in GetFiles
{
    var result = new List<string>();
    result.AddRange(Directory.GetFiles(path));
    foreach (var directory in Directory.GetDirectories(path))
    {
        result.AddRange(GetAllFiles(directory)); // recursively calling itself
    }

    return result;
}

Implementations

It doesn't make sense to talk about coding implementations at this point since this is more a pattern than a structure and we would need a lot more information on what we want to achieve to do actually go through a code implementation. That said, it is interesting to see where trees are used: - File systems - Document Object Models (like HTML or XML)

References

SFTPK: Linked List

Submitted by Robert MacLean on Sat, 06/04/2016 - 10:57

This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.

Linked List

In the previous post on Array, we saw that all read operations are Θ(1), which is awesome. An important reality of programming is everything is a trade off, so when you get fast reads with Array adding items when you don't know the collection size is expensive.

Array Growth Issue Example

Lets say you create an array of ints, named X, and set the length to 5 (currently that is using 20 bytes). Now we want to add a 6th item, so the solution is to create a second array, named Y, with a larger length. If we just want to handle one more, it means Y is now taking up 24 bytes of memory. Then we need to do a bunch of copy operations as we copy items from X to Y, which is really slow. By the end of the process, just adding one item was really expensive.

Linked List to the rescue

The solution is to change the way we store the data structure in memory. With a Linked List which each value is wrapped with metadata and stored separately in memory (compared to an Array which stores all values in a single continuous block of memory). The reason each item is wrapped, is that it then gets pointer to the next item in the collection, so that you can still navigate through the collection.

Linked List

Pros and Cons

The big advantage to Linked List is that since the values can go anywhere in memory the collection can be expanded indefinitely until you run out of memory for very little cost, either Θ(n) or Θ(1). The difference is if the collection implementation keeps a pointer to the final item in or not; if it does not then it needs to navigate through each item, Θ(n), and if it knows the location of the last item then it just needs just go directly to it and set its pointer to the next item.

Removing and reordering items is also much faster than an array since you just need to find the items before/after and change where their pointers point to.

What is the downside then? Navigation through the collection is slower than an array. For example if we create an integer array and I want to access the fifth item can be done with simple math: (start of array in memory) + (int size in memory * offset) - that will give us the location of the integer value we want to read, basically an Θ(1) operation.

With Linked List though, I need to ask the first item where the second is; then ask the second where the third is; then ask the third where the forth is; then ask the forth where the fifth is. So Θ(n) operation for reading.

Linked lists also use more memory since you aren't just storing values, we are storing the values and one or two pointers with each value. This is marginal when storing types without a constant size, like a class since an array then needs to store the pointers to the values, but it is worth remembering.

Structures

The interesting thing about linked list compared to array is that it is very flexible in its implementation. The simplest version is to just have a pointer to the first item and each item in the collection needs to point the next item. This is known as a singly linked list, as each item is linked to one other.

Linked List

The linked list may also store a pointer to the last item to make adding faster.

Linked List

Doubly Linked

Most common implementations though use a doubly linked list where each item in the collection not only points to the next item in the collection, but also points to the previous item in the collection. At the trade off of memory (for the extra pointer) and potentially more expensive operations (like a insert now impacts two items and not just one) you gain the ability to navigate forwards and in reverse.

Linked List

Implementations

Java has a doubly linked list implementation with LinkedList and .NET also has a doubly linked list implementation with LinkedList. JavaScript has no native implementation of it, however there is plenty of articles on how to implement it.

References

SFTPK: Array

Submitted by Robert MacLean on Mon, 05/23/2016 - 09:54

This post is one in a series about stuff formally trained programmers know – the rest of the series can be found here.

Array

This is the first in the data structure reviews and likely the simplest; the humble array. The first issue is the term Array -  it term differs depending on who uses it Sad smile but we will get to that a bit later.

Generally I think of an array like this:

An array is a container object that holds a fixed number of values of a single type. The length of an array is established when the array is created. After creation, its length is fixed. Oracle Java Documentation

Seems simple enough. There are two limits placed on our container: single type & fixed length and both relate to how the array is handled in memory. When an array is created it looks at the type & length and uses that to calculate how much memory is needed to store all of that. For example if we had an array of 8 items we would get a block of memory allocated for the array like this:

image

In some systems arrays can just grow but allocating more memory at the end, these are called dynamic arrays. However many systems do not allow this because the way memory is handled is there might not be any space after the last item to grow into, thus the array length is fixed as there isn’t any memory allocated for that array instance.

This has a major the advantage to read performance, since I can quickly calculate the where the item will be in memory – thus skipping having to read/navigate all other items. For example:

If my arrays values start at position 100 in memory and I want the 4th item in an int[], it would be 4 (for the position) multiplied by 4 (for the int size) + 100 (for the start address) & boom value!

This makes every read an O(1) operation!

Object[]

What happens when we can’t know the size of the items in the array, for example if we created an object[] which can hold anything?

In this scenario, when the array is created, rather than allocating memory based on length multiplied by type size, it allocates length multiplied by the size of a pointer and rather than storing the values themselves in the array memory, it stores pointers to other locations in memory where the value is.

Obviously this has a slightly worse performance than an array where we can have the values in it – but it is slight. Below is some output from BenchmarkDotNet comparing sequential reads of an int[] vs. object[] (code here) and it is close:

                     Method |     Median |    StdDev |
--------------------------- |----------- |---------- |
    IntArraySequentialReads | 52.2905 us | 4.9374 us |
ObjectArraySequentialReads | 58.3718 us | 5.4106 us |

Associative Arrays/Dictionary

As mentioned above, not every array is an array – some languages (PHP & JavaScript for example) do not allocate a block of memory like described above. These languages use what is called an associative array, also known as a map (PHP likes to refer to it this way) or a dictionary.

Basically these all have a key and a value associated to them and you can lookup the value by using the key. Implementation details differ though from platform to platform.

For example on C#, Dictionary<TKey,TValue> it is handled with an array under the covers, however in JavaScript it is a normal object. When an item is added to the array in JavaScript, it merely adds a new property to the object and that property is the index in the array.

Associative arrays do take up more memory than a traditional array (good example here of PHP where it was 18 times larger).

Multi-dimensional arrays

Multi-dimensional arrays also differ platform to platform. The Java version of it is an array of arrays, which achieves the same goal is basically implemented the same as as object[] was described above. In C# these are known as jagged arrays.

C# and other languages have proper multi-dimensional arrays which work differently – they actually take all the dimensions, multiply them together and use that for the length of an array. The dimensions just give different offsets.

Example:

image

Jagged arrays do have one benefit over a multi-dimensional array, since each internal array is independent, they can be different sizes where a multi-dimensional array all the dimensions must be the same size.

C# – List<T>

If you are working in C#, you might be asking yourself what List<T> is and how it relates to Array since it can grow forever! List<T> is just an array with initial size of 4! When you call .Add to add a 5th item, it then does the following:

  1. Create second array of where the length is double the current array
  2. Copy all items from first array to second array
  3. Use second array now

This is SUPER expensive and also why there is an optional constructor where you can override the initial size which helps a lot. Once again using BenchmarkDotNet you can see that it makes a nice difference (code):

                  Method |      Median |     StdDev |
------------------------ |------------ |----------- |
  DefaultConstructorUsed | 701.7312 us | 38.5573 us |
ConstructorWithSizeUsed | 548.5436 us | 13.1122 us |

JavaScript Arrays

As mentioned above, the standard JavaScript array is an associative array. However, JavaScript (from ES5) does contain support for typed arrays. The supported methods differ so this isn’t an easy replacement and it only supports a limited number numeric of types. Might make a lot of sense to use these from a performance reason since they are implemented as actual arrays by the JavaScript runtimes that support it.