One of the design goals of SSZ is that it should make it easier for other blockchains to work with merkle proofs referencing Eth2 consensus objects. Once phase0 is launched, we can expect various official SSZ records to start appearing in third party databases. This would significantly increase the difficulty of coordinating upgrades to the SSZ spec (due to the limited forward compatibility provisions in SSZ, a lot of applications may get broken in the process). Due to this, I think we should consider introducing some final refinements and optimisations to the SSZ spec before phase0 is launched:
1) Reduce the size of every variable-size container by 4 bytes.
Every variable-size container (i.e. record with fields) consists of a fixed-size section storing the offsets of the variable-size fields.
The offset of the first such field currently has only one valid value - it must be equal to the length of the fixed-size section. The implementations are expected to check for this, because otherwise there might be some unused bytes in the SSZ representation which is considered an invalid encoding.
The motivation for not allowing unused bytes is that this would break the property deserialize(serialize(x)) == x which is quite useful for fuzzing. For completeness, I would mention that if unused bytes were allowed, a very limited form of forward-compatibility will be present - it would be possible to add a new field at the end of a record without breaking older readers. Since SSZ upgrades require coordination and all long-term storage applications should also feature an out-of-band version tag, this limited form of forward-compatibility was considered unnecessary.
In other words, since the first offset has only one valid value that is completely derived from the type schema, the offset carries no information and can be omitted from the representation. The result will be that every variable-size container will be 4 bytes shorter. Admittedly, 4 bytes are not much, but if we consider the long expected life of the SSZ spec and great multitude of places where SSZ records might appear, some quick back-of-the-envelope calculation estimated the total cost savings in bandwidth and storage to amount to roughly 1 gazillion bytes :P
2) Null-value optimisation (a.k.a better support for pointer types and Option[T])
The SSZ spec defines union types that can discriminate between null and a possible value. Let's call such types Nullable. Since the Nullable types have variable size, their length in bytes can be zero (just like how we encode zero-length lists with two consecutive offsets with the same value). I propose the addition of the following two special rules:
- The
null value of a Nullable union is encoded as zero bytes.
- A union with just one non-null branch is encoded without a
serialized_type_index.
Please note that in most programming languages, the unions described above can be mapped to frequently used types such as Option[T] or a pointer type. During the development of the blocks_by_range protocol, an earlier version was suggesting that missing blocks should be indicated in the response as a default(T) encoding of the BeaconBlock type. This was semantically equivalent to using an Option[T] type, but it would have been considerably more inefficient. The design of the protocol was refined in later versions to not require this form of response, but I think that if one of the very first protocols was that close to using and benefiting from the Option[T] type, we can expect more protocols to appear in the future that will benefit as well.
3) Resolve a contradiction in the SSZ List limit type
The SSZ spec doesn't specify what is the type of the list size limit. This leads to something that can be described as a slight contradiction in the current specs:
The size limit of the validator registry is set to 1099511627776 (2^40). On the other hand, the maximum size in practice is limited in the encoding to the difference of two offset values. Since the offset values are encoded as uint32, the maximum size in practice cannot be larger than 2^32. Perhaps the intention for the size limit is that it should only affect the merkle hash computation, but the spec would do nice to clarify this.
One of the design goals of SSZ is that it should make it easier for other blockchains to work with merkle proofs referencing Eth2 consensus objects. Once phase0 is launched, we can expect various official SSZ records to start appearing in third party databases. This would significantly increase the difficulty of coordinating upgrades to the SSZ spec (due to the limited forward compatibility provisions in SSZ, a lot of applications may get broken in the process). Due to this, I think we should consider introducing some final refinements and optimisations to the SSZ spec before phase0 is launched:
1) Reduce the size of every variable-size container by 4 bytes.
Every variable-size container (i.e. record with fields) consists of a fixed-size section storing the offsets of the variable-size fields.
The offset of the first such field currently has only one valid value - it must be equal to the length of the fixed-size section. The implementations are expected to check for this, because otherwise there might be some unused bytes in the SSZ representation which is considered an invalid encoding.
The motivation for not allowing unused bytes is that this would break the property
deserialize(serialize(x)) == xwhich is quite useful for fuzzing. For completeness, I would mention that if unused bytes were allowed, a very limited form of forward-compatibility will be present - it would be possible to add a new field at the end of a record without breaking older readers. Since SSZ upgrades require coordination and all long-term storage applications should also feature an out-of-band version tag, this limited form of forward-compatibility was considered unnecessary.In other words, since the first offset has only one valid value that is completely derived from the type schema, the offset carries no information and can be omitted from the representation. The result will be that every variable-size container will be 4 bytes shorter. Admittedly, 4 bytes are not much, but if we consider the long expected life of the SSZ spec and great multitude of places where SSZ records might appear, some quick back-of-the-envelope calculation estimated the total cost savings in bandwidth and storage to amount to roughly 1 gazillion bytes :P
2) Null-value optimisation (a.k.a better support for pointer types and
Option[T])The SSZ spec defines union types that can discriminate between
nulland a possible value. Let's call such typesNullable. Since theNullabletypes have variable size, their length in bytes can be zero (just like how we encode zero-length lists with two consecutive offsets with the same value). I propose the addition of the following two special rules:nullvalue of aNullableunion is encoded as zero bytes.serialized_type_index.Please note that in most programming languages, the unions described above can be mapped to frequently used types such as
Option[T]or a pointer type. During the development of theblocks_by_rangeprotocol, an earlier version was suggesting that missing blocks should be indicated in the response as adefault(T)encoding of theBeaconBlocktype. This was semantically equivalent to using anOption[T]type, but it would have been considerably more inefficient. The design of the protocol was refined in later versions to not require this form of response, but I think that if one of the very first protocols was that close to using and benefiting from theOption[T]type, we can expect more protocols to appear in the future that will benefit as well.3) Resolve a contradiction in the SSZ List limit type
The SSZ spec doesn't specify what is the type of the list size limit. This leads to something that can be described as a slight contradiction in the current specs:
The size limit of the validator registry is set to 1099511627776 (2^40). On the other hand, the maximum size in practice is limited in the encoding to the difference of two offset values. Since the offset values are encoded as
uint32, the maximum size in practice cannot be larger than 2^32. Perhaps the intention for the size limit is that it should only affect the merkle hash computation, but the spec would do nice to clarify this.