Indexed Merkle Tree

Table of Contents

The Indexed Merkle Tree (IMT), detailed in this paper, is a committed data structure that modifies the standard Merkle tree data structure to represent a key-value mapping with efficient exclusion proofs. It achieves this by adding additional metadata to each leaf node to enable exclusion proofs via standard Merkle proofs into the IMT. We will use IMTs to commit to a mapping between 32-byte keys and variable length values.

Structure of the IMT

An IMT is a binary Merkle tree over two types of nodes:

  • Leaf Nodes: These represent leaves of the tree and contain key-value pairs as well as additional metadata which enables efficient exclusion proofs.
  • Inner Nodes: These are internal nodes of the tree and consist of the hashes of the two child nodes as usual.

The number of leaf nodes in the tree is known as the capacity, which is initialized at 1 and can be dynamically increased upon insertion. A capacity 2 ** C IMT contains 2 ** C leaf nodes, which are either considered active, meaning they represent a key-value pair, or inactive, meaning they are free to be used for future insertions.

IMT Leaf Node

Leaf nodes of an IMT are serialized as fixed length 99-byte arrays with the following layout:

[active (1 byte), prefix (1 byte), key (32 bytes), next_prefix (1 byte), next_key (32 bytes), value_hash (32 bytes)]

where the fields are as follows:

  • active - A boolean flag indicating whether the leaf node holds an active tuple. If active == true, then key and value are the stored key-value tuple at this leaf node, and next_key is the next largest key in the key-value mapping. Otherwise, if active == false, then the leaf node must be the zero byte array [0u8; 99].
  • prefix - A boolean flag indicating whether the key is a dummy key. It is 1 if key is not a dummy key, and 0 otherwise, in which case key must equal [0u8; 32].
  • key - The key associated with this leaf node.
  • next_prefix - A boolean flag indicating whether the next_key is a dummy key. It is 1 if next_key is not a dummy key, and 0 otherwise, in which case next_key must equal [0u8; 32].
  • next_key - The next key in the IMT that is larger than this key.
  • value_hash - The keccak256 hash of the value associated with the key.

The hash of a leaf node is computed as keccak256(abi.encodePacked(active, prefix, key, next_prefix, next_key, value_hash)). We use DUMMY to represent the 33-byte dummy prefix and key [0u8; 33].

The adjacent keys key and next_key must satisfy the IMT invariant that one of the following must be true:

  • The leaf node is inactive, i.e. active == false.
  • Both prefix == 1 and next_prefix == 1, in which case key < next_key and there are no active nodes with key < node_key < next_key.
  • We have prefix == 1 and next_prefix == 0, in which case there are no active nodes with key < node_key.
  • We have prefix == 0 and next_prefix == 1, in which case there are no active nodes with node_key < next_key.
  • We have prefix == 0 and next_prefix == 0, in which case key and next_key must be [0u8; 32].

Every IMT will have at least one active leaf node.

IMT Inner Node

Inner nodes of an IMT are serialized as fixed length 64-byte arrays with the following layout:

[left_hash (32 bytes), right_hash (32 bytes)]

where left_hash and right_hash are the hashes of the left and right children of the inner node, respectively. The hash of an inner node is computed as keccak256(abi.encodePacked(left_hash, right_hash)).

Operations on the IMT

Initialization

An empty IMT is initialized with capacity 1 and a single dummy leaf node given by [true, DUMMY, DUMMY, keccak256([])]. The root hash of the empty IMT is the hash of this dummy leaf.

Value Update

To update the value associated with an existing key, we simply update the value_hash field of the corresponding leaf node.

Key-Value Insertion

Insertion of a new key-value pair [new_key, new_value] into an IMT with capacity 2 ** C involves the following steps:

  1. If all leaves are active, the tree's capacity is doubled to 2 ** (C + 1) by appending 2 ** C inactive leaf nodes to the tree and updating the root hash by hashing the existing root hash with a root hash of 2 ** C inactive leaf nodes.
  2. Find an active leaf node with prefix, key, next_prefix, next_key such that one of the following holds:
  • prefix == 1 and next_prefix == 1 and key < new_key < next_key
  • prefix == 1 and next_prefix == 0 and key < new_key
  • prefix == 0 and next_prefix == 1 and new_key < next_key
  • prefix == 0 and next_prefix == 0
  1. For the leaf node found in Step 2, update next_key to new_key and next_prefix to 1.
  2. Replace the inactive leaf node with the active leaf node [true, new_key, next_key, keccak256(new_value)] and update the IMT root hash accordingly.

This procedure maintains the IMT invariant. To verify the insertion is valid, a prover must provide:

  • If all leaves of the tree were active in Step 1, a proof of the root hash update for capacity doubling.
  • An exclusion proof for new_key by proving inclusion of the leaf node found in Step 2.
  • Merkle proofs to verify the update to the leaf nodes in Steps 3 and 4.

Inclusion and Exclusion Proofs

Inclusion proofs for a key-value pair [key, value] are given by standard Merkle proofs. A special feature of the IMT is the ability to cheaply provide an exclusion proof that a key target_key is not present. This is done by proving inclusion of an active leaf node with prefix, key, next_prefix, next_key satisfying one of the following conditions:

  • prefix == 1 and next_prefix == 1 and key < target_key < next_key
  • prefix == 1 and next_prefix == 0 and key < target_key
  • prefix == 0 and next_prefix == 1 and target_key < next_key
  • prefix == 0 and next_prefix == 0

By the IMT invariant, if such a leaf node exists in the IMT, then target_key is not present in key-value mapping.

Key Siloing

To commit to multiple key-value mappings in a single IMT, we silo keys by replacing them with their prefixed hashes with 2 siloing bytes, meaning that we store the key-value tuple [original_key, original_value] at the siloed key

key = keccak256(concat([silo_bytes, original_key]))

where silo_bytes is a 2-byte array indicating the siloing namespace for the key-value mapping.