Generating data availability roots from shard block roots

In a beacon block, we have N data roots `r[1] ... r[n]` (in the worst case, N = 256), each of which represent data `d[1] ... d[n]` of different sizes: on average 128 kB, in the worst case 512 kB. We want to combine these roots into a combined root `R = f(r[1] ... r[n])` which is a Merkle root of some kind of concatenation of `d[1] ...d[n]`.

The main desiderata are:

* `R` can be computed from `r[1] ... r[n]` either directly or without adding much auxiliary data
* Minimal "wasted space" in `R` (consider that if we force all `r[i]` to have the same height, on average there will be 75% wasted space)

Possibilities I see:

1. Allow `r[i]` to have different heights depending on its size, so there's on average ~30% maximum 49% wasted space. Then create R by combining all `r[i]` values of each size, adding a bit of padding if needed and then combining the different sizes. Main weakness is that variable heights are not SSZ-friendly.
2. Expose a maximum of 4 data sub-roots where `r[i] = root(r[i][0], r[i][1], r[i][2], r[i][3])`. Reduces wasted space even further, and is cleaner (including being SSZ compliant), but increases number of block hashes from 1 to average 1.5 max 4, increasing beacon chain load by 1.2x average-case and 2.33x worst case.
3. Expose 3 data sub-roots where `r[i][0]` is the root of the first half of the data if the data is more than half the max size and otherwise zero, `r[i][1]` is the root of the first quarter of the data not covered by `r[i][0]`, and `r[i][2]` is the root of the first eighth of the data not covered by `r[i][1]`. Reduces wasted space heavily (maybe ~10% wasted), but increases beacon chain load ~1.89x, and is more complicated (including non-SSZ-friendliness from (1)).

For (1) here is an example algorithm:

```python
def combine_roots(roots: List[Hash], sizes: List[int]) -> Hash:
    roots = []
    cur_size = 1
    ZERO_HASH = b'\x00' * 32
    while cur_size < max(sizes) * 2 or len(roots) > 1:
        if len(roots) % 2:
            roots.append(ZERO_HASH)
        roots = [hash(roots[i] + roots[i+1]) for i in range(0, len(roots), 2)]
        ZERO_HASH = hash(ZERO_HASH + ZERO_HASH)
        roots.extend([roots[i] for i in range(len(roots)) if cur_size//2 < sizes[i] <= cur_size])
    return roots[0]
```

For (2), here are concrete estimates of data overhead. With 64 shards, and a MAX_CATCHUP_RATIO of 4, and 4x slack for different attestations, there are a maximum of 1024 data roots in a block, plus 1024 state roots and 1024 lengths (that's 32 + 32 + 8 = 72 kB). Proposal (2) would increase the former to 1536 data roots hence 48 + 32 + 8 = 88 kB average case and 160 + 32 + 8 = 168 kB worst case. Note that average-case figures are expected to be ~8x less than these worst-case figures.

I am currently favoring (2) but open to other ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating data availability roots from shard block roots #1438

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Generating data availability roots from shard block roots #1438

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions