The idea is to have a notion of "primary" component not only when querying, but also when inserting.
When inserting, this "primary" component (or rather, "clustering key", in a more Cassandra-like parlance), controls how the data is sorted within a row, e.g. by instance id) before insertion.
This opens up the way to optimizations when joining results in the query processor.
It's also our opportunity to return an error if:
- the different columns have different lengths
- the payload being inserted doesn't contain the sorting key
This effectively drastically reduces how permissive the datastore is, which in turn will make it much easier to reason about and test (e.g. you can always convert all query results into dataframes and join them on the clustering key, it has to be there).
The idea is to have a notion of "primary" component not only when querying, but also when inserting.
When inserting, this "primary" component (or rather, "clustering key", in a more Cassandra-like parlance), controls how the data is sorted within a row, e.g. by instance id) before insertion.
This opens up the way to optimizations when joining results in the query processor.
It's also our opportunity to return an error if:
This effectively drastically reduces how permissive the datastore is, which in turn will make it much easier to reason about and test (e.g. you can always convert all query results into dataframes and join them on the clustering key, it has to be there).