-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: intentionally randomize the order of results when no ORDER BY is specified #24223
Description
Currently, CockroachDB queries that intuitively would scan in some natural order don't guarantee to hold to that order. This is consistent with the SQL standard, which does not require that queries return results in order unless an ORDER BY is specified. Most single node databases do return results in order, as does cockroach when the results are on a single node. This is confounded by the fact that the EXPLAIN plans for a query with an ORDER BY looks identical to a query that happens to return in that order, so developers often look at the plan to ensure it's optimized, and then neglect to add a final ORDER BY, since the plans are "identical".
Consider an example, selecting by a table's primary key column:
root@:26257/tpcc> explain select * from district order by d_w_id limit 10;
+-----------+-------+------------------+
| Tree | Field | Description |
+-----------+-------+------------------+
| limit | | |
| └── scan | | |
| | table | district@primary |
| | spans | ALL |
| | limit | 10 |
+-----------+-------+------------------+
(5 rows)
root@:26257/tpcc> explain select * from district limit 10;
+-----------+-------+------------------+
| Tree | Field | Description |
+-----------+-------+------------------+
| limit | | |
| └── scan | | |
| | table | district@primary |
| | spans | ALL |
| | limit | 10 |
+-----------+-------+------------------+
(5 rows)
The real problem is that without an explicit ORDER BY, when the table is distributed, cockroachdb optimizes results delivery by returning the first values it receives, regardless of order. Thus, users often encounter this artifact long after the fact, as it "always works" in testing on small data sizes, and only fails at scale.
One solution is to throw in a little bit of randomization when no ORDER BY is specified. Care should be taken to not do so at the cost of performance (perhaps only do this for a small fraction of rows). This is what golang does, and it helps build the intuition early that results won't be ordered unless you explictly specify that they are.
I would like to do the same in CockroachDB, unless I get strong pushback that this is an anti-feature. One useful data point is that this wrong ordering assumption has already bitten users (including some engineers who work on CockroachDB such as myself and @mjibson).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status