Consider possible improvements to the SSZ spec before phase0 is launched

One of the design goals of SSZ is that it should make it easier for other blockchains to work with merkle proofs referencing Eth2 consensus objects. Once phase0 is launched, we can expect various official SSZ records to start appearing in third party databases. This would significantly increase the difficulty of coordinating upgrades to the SSZ spec (due to the limited forward compatibility provisions in SSZ, a lot of applications may get broken in the process). Due to this, I think we should consider introducing some final refinements and optimisations to the SSZ spec before phase0 is launched:

### 1) Reduce the size of every variable-size container by 4 bytes.

Every variable-size container (i.e. record with fields) consists of a fixed-size section storing the offsets of the variable-size fields. 

The offset of the first such field currently has only one valid value - it must be equal to the length of the fixed-size section. The [implementations are expected to check for this](https://github.com/status-im/nim-beacon-chain/pull/1088), because otherwise there might be some unused bytes in the SSZ representation which is considered an invalid encoding. 

The motivation for not allowing unused bytes is that this would break the property `deserialize(serialize(x)) == x` which is quite useful for fuzzing. For completeness, I would mention that if unused bytes were allowed, a very limited form of forward-compatibility will be present - it would be possible to add a new field at the end of a record without breaking older readers. Since SSZ upgrades require coordination and all long-term storage applications should also feature an out-of-band version tag, this limited form of forward-compatibility was considered unnecessary.

In other words, since the first offset has only one valid value that is completely derived from the type schema, the offset carries no information and can be omitted from the representation. The result will be that every variable-size container will be 4 bytes shorter. Admittedly, 4 bytes are not much, but if we consider the long expected life of the SSZ spec and great multitude of places where SSZ records might appear, some quick back-of-the-envelope calculation estimated the total cost savings in bandwidth and storage to amount to roughly 1 gazillion bytes :P

### 2) Null-value optimisation (a.k.a better support for pointer types and  `Option[T]`)

The SSZ spec defines union types that can discriminate between `null` and a possible value. Let's call such types `Nullable`. Since the `Nullable` types have variable size, their length in bytes can be zero (just like how we encode zero-length lists with two consecutive offsets with the same value). I propose the addition of the following two special rules:

* The `null` value of a `Nullable` union is encoded as zero bytes.
* A union with just one non-null branch is encoded without a `serialized_type_index`.

Please note that in most programming languages, the unions described above can be mapped to frequently used types such as `Option[T]` or a pointer type. During the development of the `blocks_by_range` protocol, an earlier version was suggesting that missing blocks should be indicated in the response as a `default(T)` encoding of the `BeaconBlock` type. This was semantically equivalent to using an `Option[T]` type, but it would have been considerably more inefficient. The design of the protocol was refined in later versions to not require this form of response, but I think that if one of the very first protocols was that close to using and benefiting from the `Option[T]` type, we can expect more protocols to appear in the future that will benefit as well.

### 3) Resolve a contradiction in the SSZ List limit type

The SSZ spec doesn't specify what is the type of the list size limit. This leads to something that can be described as a slight contradiction in the current specs:

The size limit of the validator registry is set to 1099511627776 (2^40). On the other hand, the maximum size in practice is limited in the encoding to the difference of two offset values. Since the offset values are encoded as `uint32`, the maximum size in practice cannot be larger than 2^32. Perhaps the intention for the size limit is that it should only affect the merkle hash computation, but the spec would do nice to clarify this.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider possible improvements to the SSZ spec before phase0 is launched #1916

1) Reduce the size of every variable-size container by 4 bytes.

2) Null-value optimisation (a.k.a better support for pointer types and `Option[T]`)

3) Resolve a contradiction in the SSZ List limit type

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Consider possible improvements to the SSZ spec before phase0 is launched #1916

Description

1) Reduce the size of every variable-size container by 4 bytes.

2) Null-value optimisation (a.k.a better support for pointer types and Option[T])

3) Resolve a contradiction in the SSZ List limit type

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2) Null-value optimisation (a.k.a better support for pointer types and `Option[T]`)