Verifiable data structures

Verifiable log

Trillian uses a Merkle tree to implement a verifiable log.

Every record inserted into a verifiable log is added as a new leaf in the Merkle tree. In the diagram above, the four records are shown along the bottom.

The squares in the diagram are cryptographic hash values. The bottom level of hashes - A, B, D and F - are calculated by hashing records 1, 2, 3 and 4.

The next level - C and G are calculated by hashing pairs of hashes below. For instance, C is the hash of A and B.

This carries on up until there's a single hash - the tree head hash H.

Tree head hash

The single hash at the top level - in this case H - is called the tree head hash.

If any record is modified, its hash changes. This cascades up all the levels, resulting in a different tree head.

In other words, a record can't be modified, added or deleted without resulting in a completely new tree head hash.

The single tree head hash acts as a snapshot of all the records in the log. By recording the tree head hash at a point in time, you can re-calculate all the hashes at a later date and verify that nothing was changed - all from one single hash.

Also known as: Tree head, Root hash, Merkle tree hash.

A Merkle tree with 4 records

Signed tree head

Trillian always signs tree head hashes with a private key and this is called the signed tree head.

Before trusting a tree head hash from Trillian, clients verify it using a public key that's typically published separately, depending on the application.

This serves two purposes:

  • protects against tampering with the communication between Trillian and the client.
  • acts as a commitment to each tree head hash, making misbehaviour undeniable.

Inclusion proof

When a client gets a record from Trillian, it mustn't simply trust the record. Instead, it requests an inclusion proof from the log and verifies that the proof is valid. This links the content of that record back to a tree head hash.

The client calculates the hash of the record and the chain of hashes up the tree to the tree head hash.

In the diagram above, calculating an inclusion proof of record 2 would mean:

  • calculating the hash of record 2, to get B
  • asking the log for missing hashes A and D
  • calculating the hash of A + B to get C
  • calculating the hash of C and D to get E
  • comparing E to the tree head hash advertised by the log

In this case, the inclusion proof verifies that the exact content of record 2 is present in the log that has a tree head E.

A Merkle tree with 3 records

Consistency proof

A consistency proof verifies the consistency between a log at two points in time.

It verifies that the later version includes everything in the earlier version, in the same order, and all new entries come after the entries in the older version.

Consistency proofs tell us nothing was tampered with

Suppose you have a log with three records called Tree 3.

At some point in time you inspected records 1, 2 and 3 and were confident they were correct.

You kept a copy of the tree head hash E as a snapshot of all the records. By re-calculating the hashes you can use E to check whether any records are modified.

This means you won't have to inspect those records all over again.

Tree 3

When records are added, how to check previous records?

Later, another record 4 is added - let's call it Tree 4.

With this new tree, how do you know that records 13 haven't been modified? Our snapshot E is no longer the tree head hash.

The Merkle tree allows us to calculate a consistency proof using our saved E to verify that records 13 didn't change.

Specifically, a consistency proof between Tree 3 and Tree 4 answers:

  • Does Tree 4 fully contain Tree 3? (If Tree 4 was only 3 records long, would its tree head hash work out as E?)
  • If I took Tree 3 and added record 4, would I get Tree 4?

If both questions are positive, you know that records 13 haven't been tampered since you took snapshot E.

Clients must store a copy of tree head hashes

In order to check that records haven't been retrospectively altered, or tampered, clients must keep their own separate record of tree head hashes.

By keeping a record of a tree head you saw today, you can confirm that future versions of the log still contain today's log precisely, with no modifications.

In the example above, storing E allowed us to verify that Tree 4 was an extension of Tree 3, and contained unmodified records 13.

Also known as: Merkle Audit Proof

Tree 4

Full audit

A malicious log operator could quietly modify a record while neglecting to update its hash and all the other hashes in the tree.

If you requested that record but didn't calculate an inclusion proof, you wouldn't know it had been modified.

A tampered log can be detected by re-calculating every hash from scratch and comparing the tree head hash to a known historical value. By recalculating the whole tree it should always result in the same tree head hash. This is called a full audit.

The role of auditors

An external auditor has two main roles:

  1. confirm that a log's tree is being honestly maintained by performing full audits.
  2. check the actual information inside the log for good behaviour, which depends on the application.

To carry out a full audit, an auditor needs access to a historical record of tree head hashes to ensure the log hasn't been tampered with. Without this, the whole tree could be simply re-calculated to include a tampered record and there'd be no way to detect it.

Operational considerations

This demonstrates some important operational considerations:

  1. Clients must calculate inclusion proofs for every record they use.
  2. Honest log operators should publish and share widely their tree head hashes, allowing a future full audit.
  3. Clients should store copies of historical tree head hashes, assist with future audits (as well as carry out their own consistency proofs)

Verifiable map

While Trillian primarily acts as a verifiable log, it also has an experimental map mode. This allows storing key-value pairs in a verifiable way.

In map mode, Trillian arranges the Merkle tree differently from log mode. Rather than dynamically growing the tree as new log entries are added, the tree is a fixed size.

A hash function is agreed in advance, for example SHA256, which is applied to any key entered into the map.

The tree is arranged so every leaf of the tree represents a particular value of the hash function. That means a given key always results in exactly the same leaf location in the tree.

To find the leaf, you calculate the hash of the key, giving 256 bits (each can be 1 or 0). Starting at the top, you descend the tree and for each bit of the hash value you turn either "left" or "right" (1 or 0). This leads directly to the leaf that contains the public key.

In this example, suppose you want to store data against the key

"hello@example.com"

You calculate the SHA256 hash of "hello@example.com", giving 256 binary bits:

1011101010011101111011011001101101000001001110001101001111000010110001000011111011101101111111011100100100110000101100100111100101111011111000110101010001000111101100000100111000000011111111111000001000000000111000101010101110010100101010101001000000110

You descend the tree one level per bit, turning right or left for 0 and 1.

After 256 levels you reach the record that represents the value associated with the key "hello@example.com"

As with a log, the single map head hash is like a snapshot of the entire map. As with a log, you can perform an inclusion proof against a map head hash, verifying that the value associated with the key is really in the map.

Clearly, 2256 possible keys is far too many keys to actually store. For more details, including how that works, refer to the whitepaper.

Log-backed map

A verifiable map has limitations which can be addressed by combining a verifiable map with a verifiable log. Any changes to the verifiable map are entered into a verifiable log, and only when present in the log are they applied to the verifiable map. This allows proving the correct operation over time.

This is the data structure currently used by Key Transparency.

The Verifiable Data Structures whitepaper discusses the limitations of a verifiable map, and how a log-backed map addresses these.

Resources