This post is one in a series of stuff formally trained programmers know – the rest of the series can be found in the series index.
Binary Search Tree
In the previous post, we covered a Binary Tree, which is about the shape of storing the data. The Binary Search Tree (BST) is a further enhancement to that structure.
The first important change is that the data we are storing needs a key; if we have a basic type like a string or number then the value itself can be the key and if we have a more complex class, then we need to define a key in that structure or we need to build a unique key for each item.
The second change is a way to compare those keys which is crucial for the performance of the data structure. Numbers are easiest since we can easily compare which is larger and smaller.
The third and final change is the way we store the items; the left node's key will always be smaller than the parent nodes key and the right node's key will be larger than the parent node.
As an example, here is a BST using just numbers as keys:
Note that all nodes to the left are smaller than their parent and all parents above that.
So, why should we care about a BST? We should care because searching is really performant in it as each time you move a depth down, you eliminate approximately 50% of the potential nodes.
So, for example, if we wanted to find the item in our example with the key 66, we could start at the root (50) and move right. At that point, we have eliminated 8 possible nodes immediately. The next is to the left from the node with the 70 (total possible nodes removed 12). Next is to the right of the node with the value of 65, and then to 66 to the left of 67. So we found the node with 5 steps.
Going to Big O Notation, this means we achieved a performance of close to O(log n). It is possible to have a worst case of O(n), when the tree is not Optimal or Unbalanced.
Balanced versus Unbalanced
In the above example we have a binary search tree which is Optimal, i.e. it has the lowest depth needed. Below we can see a totally valid BST; Each child node is to the right of the parent because it is bigger than the parent.
This, however, will result in a O(n) search performance which is not ideal. The solution is to rebalance the BST and for our example above we can end up with multiple end states (I'll show two below). The key takeaway is that we go from a depth of 5 to a depth of 3.