All notes
Algorithms

# Trees

## Binary Search Tree

BST.

### Time complexity

Search, Insert and Delete in average O(log n) and worst O(n).

## B-tree

• B-tree is a self-balancing tree that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time O(log n).
• B-tree is a generalization of a binary search tree in that a node can have more than two children (within some pre-defined range).
• E.g. 2-3 B-tree, each internal node may have only 2 or 3 child nodes.
• Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full.
• It is commonly used in databases and filesystems.

### Structure

#### Keys

Each internal node of a B-tree will contain a number of keys. The keys act as separation values which divide its subtrees.

The following is a 3-5 B-tree.The numbers here are the "keys", and the dot denotes the branching point.

### Etymology

Rudolf Bayer and Ed McCreight invented the B-tree while working at Boeing Research Labs in 1971.

Donald Knuth speculated that the "B" may have originated from Boeing or from Bayer's name.

## Red-Black tree

### Intro

A red–black tree is a kind of self-balancing binary search tree.

Each node of the binary tree has an extra bit, and that bit is often interpreted as the color (red or black) of the node.
These color bits are used to ensure the tree remains approximately balanced during insertions and deletions.

When the tree is modified, the new tree is subsequently rearranged and repainted to restore the coloring properties.
The properties are designed in such a way that this rearranging and recoloring can be performed efficiently.

The balancing of the tree is not perfect, but it is good enough to allow it to guarantee the following operations in O(log n) time:

• searching.
• insertion
• deletion
• the tree rearrangement and recoloring

Red–black trees, like all binary search trees, allow efficient in-order traversal.

Space complexity: O(n).

### History

1. In a 1978 paper, "A Dichromatic Framework for Balanced Trees", Leonidas J. Guibas and Robert Sedgewick derived the red-black tree from the symmetric binary B-tree.
2. In 1993, Andersson introduced the idea of right leaning tree to simplify insert and delete operations.
3. Sedgewick originally allowed nodes whose two children are red making his trees more like 2-3-4 trees but later this restriction was added making new trees more like 2-3 trees.

### Detail

#### Leaf nodes

The leaf nodes of red–black trees do not contain data.
Usually null child pointer can encode the fact that this child is a leaf. When the leaves really need to be explicit nodes, to save memory, a single sentinel node performs the role of all leaf nodes; all references from internal nodes to leaf nodes then point to the sentinel node.

#### Properties

• A node is either red or black.
• The root is black.
• All leaves (NIL) are black.
• (Prop 4) If a node is red, then both its children are black.
• (Prop 5) Every path from a given node to any of its descendant NIL nodes contains the same number of black nodes.
• Requirements imposed on a binary search tree.

A critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is roughly height-balanced.

##### Proof of critical property

It is guanranteed by properties 4 and 5 together.

1. For a red–black tree T, let the shortest possible path from the root of T to any leaf consist of B black nodes.
2. Longer possible paths may be constructed by inserting red nodes. However, property 4 makes it impossible to insert more than one consecutive red node. Therefore, the longest possible path consists of 2*B nodes, alternating black and red (this is the worst case).

The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes.

#### Definitions

The number of black nodes from the root to a node is the node's black depth;
The uniform number of black nodes in all paths from root to the leaves is called the black-height of the red–black tree.

# Hashing

## Consistent hashing

Consistent hashing is a special kind of hashing such that when a hash table is resized, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots.
In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.
Consistent hashing achieves some of the same goals as Rendezvous hashing (also called HRW Hashing).

The term "consistent hashing" was introduced by Karger et al. at MIT for use in distributed caching.

# Recommendations algorithms

Examples include:

• Youtube's like/dislike and video recommendations.
• Email's spam filter.