In this tutorial, we’ll talk about a binary search tree data structure time complexity.
2. The Main Property of a Binary Tree
Knuth defines binary trees as follows: “A binary tree is a finite set of nodes which either is empty or consists of a root and two disjoint binary trees called the left and the right subtrees of the root.”
Let’s start with a generic structure of a binary tree:
There are, of course, non-binary trees. However, it is important to note that a binary tree is not a special case of a tree but is a different concept. For example, those trees:
We can consider them identical when defining them as ordinary trees but different when analyzed as binary trees.
In a binary search tree, each node is identified by a key, which is stored respecting the following property:Let be a node of a binary tree. If is a node in the left subtree of then . If is a node in the right subtree of , then .
3. Elementary Operations in Binary Search Trees
Suppose a set of data, for example, a database , which contains information in ASCII format. Each row or record in the database is made up of a series of distinct fields identified by a key. Let be the number of records in the database, each consisting of fields.
We’ll then have a key field and fields containing the associated information. Suppose that the key is unique for each record. It is possible to store organized as a binary search tree based on the property mentioned above.
Elementary or primitive operations in the binary search trees are search, minimum, maximum, predecessor, successor, insert, and delete. Computational complexity depends on the concept of the height of the tree , which we can informally define as the number of levels of which the tree is composed. For example, the binary tree from the first figure has 5 levels (including root).
4. Time Complexity of a Search in a Binary Tree
Suppose we have a key , and we want to retrieve the associated fields of for . The problem is formulated as the identification of the node such that . So, we move into the tree, starting from the root node, comparing our key with the keys of the nodes we visit. Note that each move involves the descent of a level in the tree.
If the key is unique, the number of nodes visited during the search is at most equal to , and the search can be done in time . This behavior is also satisfied by the other primitive operations, so we have the following important and intuitive result: all operations in Binary Search Tree of height can be performed in time .
5. The Problem of Optimizing the Search
Not all binary search trees are equally efficient when performing a primitive operation. The key to improving efficiency is given by the fact that computational complexity depends on and not on .
The way the elements are arranged in the binary tree affects its height. In general, we can state the problem of the optimal construction, such as the search for the arrangement of the nodes that leads to the tree with the minimum height.
The worst scenario is a database already sorted by key. In this case, if we build a binary tree through insertions of the records in the original order, we will get a tree that contains only left or right subtrees, depending on whether the order of the keys is respectively descending or ascending:
In this case, , and by the discussion of the previous paragraph, the realization of a primitive operation occurs in time . This case is equivalent to a linked list.
6. Search in Balanced Trees
If keys of are disordered, building a binary tree based on insert operations produces a structure with . When the heights of the left and right subtree of any node differ by not more than 1, the tree is said to be balanced, and the following result can be demonstrated:
The average height of a randomly constructed binary search tree with distinct keys is .
From previous results, we conclude that the search for a key and, in general, any primitive operation performed on a binary search tree, takes time in the worst case and in the average case. The construction of a tree based on the insertion of the records of therefore requires time in the worst case and in the average case.
7. Practical Problems and Variants in Binary Search Trees
Binary search trees are used in many computational procedures. However, the basic theory illustrated in this tutorial is not without problems.
In real applications, binary search trees are not necessarily balanced. It must be considered that maintaining a perfectly balanced binary tree at each step is an expensive procedure, which could lead to a removal of the balancing conditions and overall degradation.
There are variants that solve these drawbacks. Examples are self-balancing binary search trees and RB-trees (Red-Black).
RB-trees are used within many database engines. Compared to standard binary trees, they also contain an additional binary field called color. Through precise rules of coloring the nodes, it can be obtained that the length of any path is not more than twice as any other.
All these variants of the binary trees are designed pursuing the same objective: the optimal construction that allows obtaining an optimal balancing that results in a tree of minimum height.
In this tutorial, we have made an overview of the basic theory of binary search trees. We have focused on the computational cost of primitive operations, in particular the search operation.
In the text, some ideas are suggested to the reader for further study, in particular the possible balancing techniques.