Indexed Merkle Tree
Table of Contents
The Indexed Merkle Tree (IMT), detailed in this paper, is a committed data structure that modifies the standard Merkle tree data structure to represent a key-value mapping with efficient exclusion proofs. It achieves this by adding additional metadata to each leaf node to enable exclusion proofs via standard Merkle proofs into the IMT. We will use IMTs to commit to a mapping between 32-byte keys and variable length values.
Structure of the IMT
An IMT is a binary Merkle tree over two types of nodes:
- Leaf Nodes: These represent leaves of the tree and contain key-value pairs as well as additional metadata which enables efficient exclusion proofs.
- Inner Nodes: These are internal nodes of the tree and consist of the hashes of the two child nodes as usual.
The number of leaf nodes in the tree is known as the capacity, which is initialized at 1 and can be dynamically increased upon insertion. A capacity 2 ** C
IMT contains 2 ** C
leaf nodes, which are either considered active, meaning they represent a key-value pair, or inactive, meaning they are free to be used for future insertions.
IMT Leaf Node
Leaf nodes of an IMT are serialized as fixed length 99-byte arrays with the following layout:
[active (1 byte), prefix (1 byte), key (32 bytes), next_prefix (1 byte), next_key (32 bytes), value_hash (32 bytes)]
where the fields are as follows:
active
- A boolean flag indicating whether the leaf node holds an active tuple. Ifactive == true
, thenkey
andvalue
are the stored key-value tuple at this leaf node, andnext_key
is the next largest key in the key-value mapping. Otherwise, ifactive == false
, then the leaf node must be the zero byte array[0u8; 99]
.prefix
- A boolean flag indicating whether thekey
is a dummy key. It is1
ifkey
is not a dummy key, and0
otherwise, in which casekey
must equal[0u8; 32]
.key
- The key associated with this leaf node.next_prefix
- A boolean flag indicating whether thenext_key
is a dummy key. It is1
ifnext_key
is not a dummy key, and0
otherwise, in which casenext_key
must equal[0u8; 32]
.next_key
- The next key in the IMT that is larger than thiskey
.value_hash
- The keccak256 hash of the value associated with the key.
The hash of a leaf node is computed as keccak256(abi.encodePacked(active, prefix, key, next_prefix, next_key, value_hash))
. We use DUMMY
to represent the 33-byte dummy prefix and key [0u8; 33]
.
The adjacent keys key
and next_key
must satisfy the IMT invariant that one of the following must be true:
- The leaf node is inactive, i.e.
active == false
. - Both
prefix == 1
andnext_prefix == 1
, in which casekey < next_key
and there are no active nodes withkey < node_key < next_key
. - We have
prefix == 1
andnext_prefix == 0
, in which case there are no active nodes withkey < node_key
. - We have
prefix == 0
andnext_prefix == 1
, in which case there are no active nodes withnode_key < next_key
. - We have
prefix == 0
andnext_prefix == 0
, in which casekey
andnext_key
must be[0u8; 32]
.
Every IMT will have at least one active leaf node.
IMT Inner Node
Inner nodes of an IMT are serialized as fixed length 64-byte arrays with the following layout:
[left_hash (32 bytes), right_hash (32 bytes)]
where left_hash
and right_hash
are the hashes of the left and right children of the inner node, respectively. The hash of an inner node is computed as keccak256(abi.encodePacked(left_hash, right_hash))
.
Operations on the IMT
Initialization
An empty IMT is initialized with capacity 1
and a single dummy leaf node given by [true, DUMMY, DUMMY, keccak256([])]
. The root hash of the empty IMT is the hash of this dummy leaf.
Value Update
To update the value associated with an existing key, we simply update the value_hash
field of the corresponding leaf node.
Key-Value Insertion
Insertion of a new key-value pair [new_key, new_value]
into an IMT with capacity 2 ** C
involves the following steps:
- If all leaves are active, the tree's capacity is doubled to
2 ** (C + 1)
by appending2 ** C
inactive leaf nodes to the tree and updating the root hash by hashing the existing root hash with a root hash of2 ** C
inactive leaf nodes. - Find an active leaf node with
prefix
,key
,next_prefix
,next_key
such that one of the following holds:
prefix == 1
andnext_prefix == 1
andkey < new_key < next_key
prefix == 1
andnext_prefix == 0
andkey < new_key
prefix == 0
andnext_prefix == 1
andnew_key < next_key
prefix == 0
andnext_prefix == 0
- For the leaf node found in Step 2, update
next_key
tonew_key
andnext_prefix
to1
. - Replace the inactive leaf node with the active leaf node
[true, new_key, next_key, keccak256(new_value)]
and update the IMT root hash accordingly.
This procedure maintains the IMT invariant. To verify the insertion is valid, a prover must provide:
- If all leaves of the tree were active in Step 1, a proof of the root hash update for capacity doubling.
- An exclusion proof for
new_key
by proving inclusion of the leaf node found in Step 2. - Merkle proofs to verify the update to the leaf nodes in Steps 3 and 4.
Inclusion and Exclusion Proofs
Inclusion proofs for a key-value pair [key, value]
are given by standard Merkle proofs. A special feature of the IMT is the ability to cheaply provide an exclusion proof that a key target_key
is not present. This is done by proving inclusion of an active leaf node with prefix
, key
, next_prefix
, next_key
satisfying one of the following conditions:
prefix == 1
andnext_prefix == 1
andkey < target_key < next_key
prefix == 1
andnext_prefix == 0
andkey < target_key
prefix == 0
andnext_prefix == 1
andtarget_key < next_key
prefix == 0
andnext_prefix == 0
By the IMT invariant, if such a leaf node exists in the IMT, then target_key
is not present in key-value mapping.
Key Siloing
To commit to multiple key-value mappings in a single IMT, we silo keys by replacing them with their prefixed hashes with 2 siloing bytes, meaning that we store the key-value tuple [original_key, original_value]
at the siloed key
key = keccak256(concat([silo_bytes, original_key]))
where silo_bytes
is a 2-byte array indicating the siloing namespace for the key-value mapping.