Segment trees

Segment tree is a data structure that allows efficient range query processing and updates on arrays.

In this section everything is assumed to be 1-indexed unless said otherwise.

What Is a Segment Tree

It is a binary tree where each node represents the value of query on a segment (subarray) of the original array. The tree is built with the following rules:

  • Leaf nodes represent a segment of length \(1\) .
  • Each internal nodes represents a combination (sum, min, max, etc.) of it’s two children.

Let’s say that we have the following array [2, 4, 6, 8]. A sum segment tree built on this array looks as follows:

graph TD
    A["20 (1-4)"]
    B["6 (1-2)"]
    C["14 (3-4)"]
    D["2 (1-1)"]
    E["4 (2-2)"]
    F["6 (3-3)"]
    G["8 (4-4)"]

    A --> B
    A --> C
    B --> D
    B --> E
    C --> F
    C --> G

Note that the leaf nodes represent only one element and the other nodes merge it’s children by taking the sum of their values.

Node Count and Padding

Let N be the number of leaf nodes in a segment tree, and n be the length of the original array.

Nodes Count

  • In a segment tree, the number of nodes at each level forms the sequence [1, 2, 4, …], i.e., powers of two.
  • Each power of 2 is eqaul to the sum of all previous powers of 2 plus 1.
  • Therefore, the total number of nodes in a segment tree is 2N - 1.

Padding

  • If the original array has n elements and n is not a power of two, we pad the array with additional elements so that the number of leaf nodes N becomes the next power of two, ensuring that the extra elements do not affect the values of the other nodes.
  • In the worst case, we may need to add up to n elements, because the next power of two is less than 2n.

Example

If we change the previous example to [2, 4, 6], we need to add an element to make the length a power of two without affecting the other values. In this case, the best value to add is 0, since it does not change the sums of other nodes. The resulting segment tree will look like the following:

graph TD
    A["12 (1-4)"]
    B["6 (1-2)"]
    C["6 (3-4)"]
    D["2 (1-1)"]
    E["4 (2-2)"]
    F["6 (3-3)"]
    G["0 (4-4) padded"]

    A --> B
    A --> C
    B --> D
    B --> E
    C --> F
    C --> G

Representation

We will represent our segment tree as an array giving indices to each node. So the previous segment tree will be given indices as follows.

graph TD
    A["1: sum = 12 (1-4)"]
    B["2: sum = 6 (1-2)"]
    C["3: sum = 6 (3-4)"]
    D["4: sum = 2 (1-1)"]
    E["5: sum = 4 (2-2)"]
    F["6: sum = 6 (3-3)"]
    G["7: sum = 0 (4-4) padded"]

    A --> B
    A --> C
    B --> D
    B --> E
    C --> F
    C --> G

Each node i has children at positions 2 * i and 2 * i + 1, and its parent is at position i / 2.

In this example, the number of leaf nodes N is 4, which is also the index of the first leaf node. Therefore, a leaf node at position i corresponds to index i - N + 1 in the original array.

Querying

Let’s define a function query(id, l, r, ql, qr) that returns the answer to a query from ql to qr, where:

  • id is the index of the current node.
  • l is the beginning of the interval covered by the current node.
  • r is the end of the interval covered by the current node.
  • ql is the beginning of the query range.
  • qr is the end of the query range.

The function works as follows:

  1. If [ql, qr] and [l, r] are completely disjoint, discard the current node, as it does not contribute to the query.
  2. If [ql, qr] completely covers [l, r], return the value stored at the current node.
  3. Otherwise, split the range at the midpoint m = (l + r) / 2 and recurse on the children:
    • query(2 * id, l, m, ql, qr)
    • query(2 * id + 1, m + 1, r, ql, qr)
      Then merge the results from both children to obtain the answer for the current node.

To solve a query [ql, qr], call query(1, 1, N, ql, qr).

The time complexity of this algorithm is \(O(\log n)\).

Point Update

Some problems will ask to update index j to value v. To do this we’ll update the leaf node at position j + N - 1 and recompute all it’s ancestors values.

The number of ancestors of leaf node j + N - 1 is logn. Therefore the complexity of point update is \(O(\log n)\)

Implementation

When building a segment tree we only have two things: the original array a(0-indexed), and its size n.

Tree Size

First we need to compute the number of leafs N. It can be computed in two ways:

Either with an equation:

int N = 1 << (32 - __builtin_clz(n - 1));

Or with a loop:

int N;
for (N = 1; N < n; N *= 2); // the ; is crucial

The size of the tree will be 2*N.

int seg[2*N]

Building The Tree

We’ll write the code to build a sum segment tree.

Leafs Construction

To build the tree we first need to put the leafs’ value.

for (int i = 0 ; i < n; i++) {
    seg[i + N] = a[i];
}

Then we pad new elements to complete the leafs.

for (int i = n + N; i < 2*N; i++){
    seg[i] = 0; // padded value depends on the segment tree
}

Internal Nodes Construction

To build Internal nodes we’ll loop over them in reverse order and merge their children.

for (int i = N - 1; i > 0 ; i--) {
    seg[i] = seg[i * 2] + seg[i * 2 + 1]; // merging children's values
}

Queries

int query(int id, int l, int r, int ql, int qr) {
    // check if disjoint
    if (ql > r || l > qr) {
        return 0; 
    }
    // check if [ql, qr] fully covers [l, r]
    if (ql <= l && r <= qr) {
        return seg[id];
    }
    // recurse and merge
    int m = (l + r) / 2;
    return query(id * 2, l, m, ql, qr) + query(id * 2 + 1, m + 1, r, ql, qr);
}

Point Updates

void pointUpdate(int j, int v){
    a[j] = v;

    int idx = j + N - 1;
    seg[idx] = v;

    // reconstruct all ancestors
    for (int i = idx / 2; i > 0; i /= 2) {
        seg[i] = seg[i * 2] + seg[i * 2 + 1];
    }
}

Example Problem

Dynamic Range Minimum Queries

Dynamic Range Minimum Queries
CSES easy

Given array \(x\) of \(n\) elements and \(q\) queries. Each query is one of two types

  • “1 \(k\) \(u\)”: update the value at position \(k\) to \(u\).
  • “2 \(a\) \(b\)”: what is the minimum value in range [\(a\),\(b\)]?

Solution

We’ll simply make a segment tree for minimum and apply the queries and point updates directly in it. Since the tree is a minimum sgment tree there will be some differences from the previous snippets because the

#include <iostream>
#include <climits>
using namespace std;

const int MAXN = 200000;
const int N = (1 << (32 - __builtin_clz(MAXN - 1)));

int seg[N * 2];
int a[MAXN];

int query(int id, int l, int r, int ql, int qr) {
    // check if disjoint
    if (ql > r || l > qr) {
        return INT_MAX; 
    }
    // check if [ql, qr] fully covers [l, r]
    if (ql <= l && r <= qr) {
        return seg[id];
    }
    // recurse and merge
    int m = (l + r)/2;
    return min(query(id * 2, l, m, ql, qr) , query(id * 2 + 1, m + 1, r, ql, qr));
}

void pointUpdate(int j, int v) {
    a[j - 1] = v;

    int idx = j + N - 1;
    seg[idx] = v;

    // reconstruct all ancestors
    for (int i = idx/2; i > 0; i /= 2) {
        seg[i] = min(seg[i * 2] , seg[i * 2 + 1]);
    }
}

int main() {
    int n, q;
    cin >> n >> q;

    for (int i = 0; i < n; i++) {
        cin >> a[i];
    }

    for (int i = 0 ; i < n; i++) {
        seg[i + N] = a[i];
    }

    for (int i = n + N; i < 2*N; i++) {
        seg[i] = INT_MAX; // padded value depends on the segment tree
    }

    for (int i = N - 1; i > 0; i--) {
        seg[i] = min(seg[i * 2] , seg[i * 2 + 1]);
    }

    while (q--) {
        int type;
        cin >> type;
        
        if (type == 1) {
            int k, u;
            cin >> k >> u;
            pointUpdate(k, u);
        }
        else {
            int ql, qr;
            cin >> ql >> qr;

            cout << query(1, 1, N, ql, qr) << "\n";
        }
    }

}