Add support for Fractional limits and offsets#88755
Add support for Fractional limits and offsets#88755nihalzp merged 121 commits intoClickHouse:masterfrom
Conversation
…or limit and offset - Used `DataTypeDecimal32(2,1)` for type conversion for now because for some reason it doesn't recognise `DataTypeFloat32()`.
…ctional[Limit/Offset]Step`
…6` values of limit/offset.
… limit in the pipe
… limit in the pipe
…aryLimitStep` as well
a5d615e to
0761e21
Compare
There was a problem hiding this comment.
I noticed that during transforms, the algorithm first caches all chunks; afterwards, when all pulling is done, only then it starts processing the chunks. The issue is that it will have slow and high memory usage and would not work if the entire table size is bigger than available memory.
For example:
SELECT *
FROM numbers(1e9)
OFFSET 0.9999999910 rows in set. Elapsed: 3.646 sec. Processed 990.88 million rows, 7.93 GB (271.76 million rows/s., 2.17 GB/s.)
Peak memory usage: 7.45 GiB.Here we are caching the entire table of 1e9 rows and that's why memory usage is high. If we used a bigger table whose size is bigger than available RAM, the query would fail.
But if we keep evicting front chunks which will never be part of the final result, we can get very low memory footprint:
SELECT *
FROM numbers(1e9)
LIMIT -1010 rows in set. Elapsed: 0.328 sec. Processed 919.59 million rows, 7.36 GB (2.80 billion rows/s., 22.41 GB/s.)
Peak memory usage: 17.27 KiB.Offset version:
SELECT *
FROM numbers(1e9)
OFFSET 1e9 - 1010 rows in set. Elapsed: 0.320 sec. Processed 943.98 million rows, 7.55 GB (2.95 billion rows/s., 23.61 GB/s.)
Peak memory usage: 9.23 KiB.For context, we can do some inference about already cache chunks and either discard or push them to free up memory. For example, suppose we have a query with OFFSET 0.5. At one point in time, before we have pulled all the data, we have cached chunks which have a total of 100 rows. Now, even if we do not yet know the total number of rows in the table, we can safely infer that the first 50 rows can be discarded, and we can discard cached complete chunks within the first 50 rows.
Similar for LIMIT, instead of discard, we would push to output ports.
This will require some extra bookkeeping but it can make the feature really high quality and blazingly fast!
|
It could be very tricky for However, I think optimizing for Let me know what you think. |
…ractional limit/offset.
- add additinoal `getLimimtLengthAndOffset()` parsing step to ensure fractional limit and offset are zero before adding a prelimit.
c458be7 to
7bd3b4a
Compare
I am not sure if optimizing I would prefer working on these + some more optimizations like limit pushdown, etc. in a separate PR This one already got too big. |
|
I really enjoyed working on this. You are a very good reviewer, thanks. I was planning to implement |
|
I see what you mean. I underestimated the complexity. Okay, let's go with your plan then! |
Anything that interests you would be very much welcome :) |
nihalzp
left a comment
There was a problem hiding this comment.
Looks good to me!
Just few small doc additions I thought would be better if we are more precise.
2dfeda3 to
635e83e
Compare
f073ef0 to
f3ca2bc
Compare
f24b4ec
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add support for fractional
LIMITandOFFSETfor selecting a fraction of a table. Closes #81892Documentation entry for user-facing changes
Details
Add support for fractional limits and offset to retrieve percentages of a table for new and old analyzers.
WITH TIES is also supported.
Examples:
Closes #81892