-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: Scans with limit acquire excessively large latches #9521
Description
A query like SELECT * FROM t WHERE indexed_column > $1 LIMIT 1 has a very large key Span (from key $1 to the end of the table or the end of the range, whichever comes first), but really only depends on a small amount of data (from $1 to the first key greater than that value). The command queue only sees the former, so this query must wait behind any updates to any other rows in the table, not just the rows that it will eventually return.
We could minimize this contention at the expense of throughput as follows. For read-only commands with (small) limits, execute the command first, before putting it in the command queue. If it reaches its limit, narrow the span based on the keys that were actually touched. Put it in the command queue under the narrowed span. After waiting on the command queue, execute it again. If it doesn't hit the limit while staying inside the narrowed span, something has changed out from under us and we have to re-queue with a broader span.
There is probably a way to be clever and avoid the double execution in the common case, e.g. if the command queue allows the narrowed span to execute immediately we can use the results of the first execution.
This is a major cause of slowness for photos (#9247). For example, this trace spends 100ms in the command queue on its first scan. There is probably a related problem with the timestamp cache, but I haven't confirmed its existence or impact yet.