Currently, the spec and the implementation differ with regards to how data is arranged into shares.
The implementation simply fills up shares with padding. This potentially bloats the erasure coded version of the block with a lot of parity bytes. or, to quote @adlerjohn:
We could make all txs start at a new share, but it's really non-ideal. Regular txs are on the order of ~80-100 bytes, so you'd be doubling the footprint of txs. And this would have to be metered since block size is limited.
The specification fills up shares with other Txs, or parts of Txs which saves block space. Currently, this looks like this:

and after celestiaorg/celestia-specs#145 / celestiaorg/celestia-specs#146 this will look sth like:

To save space, the implementation should align with the specification.
Note that this contains essentially defining an encoding to go:
- from types.Data -> slice of arranged namespaced shares
- and back from slice of arranged namespaced shares to types.Data
I'd propose to export the private method computeShares on Data:
https://github.com/lazyledger/lazyledger-core/blob/e0b492d2f92ee87b6221307e4c18058b73bb9867/types/block.go#L1339
it covers the first step above but needs to change to according what is (or rather will be) in the specs.
For the second part we'd need a method:
func DataFromShares(shares [][]byte) (*Data, error) {}
shares could also use the type alias NamespacedShares but note that all the type aliasing might require us to convert back and forth which could lead to a decrease in performance. But whatever is readable and works is good :-) The name of that method could also be ParseDataFromShares or just ParseData or ParseDataFromEDS ... something along these lines.
At least there should be unit tests for asserting roundtrips yield the same results:
types.Data -> slice of arranged namespaced shares -> types.Data
Ideally, there would also be fuzzing tests.
Currently, the spec and the implementation differ with regards to how data is arranged into shares.
The implementation simply fills up shares with padding. This potentially bloats the erasure coded version of the block with a lot of parity bytes. or, to quote @adlerjohn:
The specification fills up shares with other Txs, or parts of Txs which saves block space. Currently, this looks like this:


and after celestiaorg/celestia-specs#145 / celestiaorg/celestia-specs#146 this will look sth like:
To save space, the implementation should align with the specification.
Note that this contains essentially defining an encoding to go:
I'd propose to export the private method
computeSharesonData:https://github.com/lazyledger/lazyledger-core/blob/e0b492d2f92ee87b6221307e4c18058b73bb9867/types/block.go#L1339
it covers the first step above but needs to change to according what is (or rather will be) in the specs.
For the second part we'd need a method:
func DataFromShares(shares [][]byte) (*Data, error) {}sharescould also use the type aliasNamespacedSharesbut note that all the type aliasing might require us to convert back and forth which could lead to a decrease in performance. But whatever is readable and works is good :-) The name of that method could also beParseDataFromSharesor justParseDataorParseDataFromEDS... something along these lines.At least there should be unit tests for asserting roundtrips yield the same results:
types.Data -> slice of arranged namespaced shares -> types.Data
Ideally, there would also be fuzzing tests.