colexecdisk: make sure to release resources in all cases#81419
colexecdisk: make sure to release resources in all cases#81419craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
Previously, it was possible to not release some resources when external
distinct or external hash aggregator short circuit their execution
(either because of an error or because of the LIMIT on the query) in
some cases (namely, when a sort is planned on top of the external
operation to restore the desired ordering). This was the case because
the hash-based partitioner (which abstracts away the disk-backed
algorithm) wasn't added to `OpWithMetaInfo.ToClose` slice since there is
a sort on top of it nor was it closed by that sort.
Here is an example diagram for all the infra that is set up for the
disk-backed distinct when ordering needs to be maintained:
```
diskSpillerBase (disk-backed distinct)
| |
UnorderedDistinct diskSpillerBase [1] (disk-backed sort)
| | |
in-mem sorter external sorter hash-based partitioner [2]
```
In this diagram, `hash-based partitioner [2]` is the external distinct
that is the input to the `diskSpillerBase [1]`. In the happy path (when
`[2]` is exhausted), it is `Close`d automatically. However, if its
execution is short-circuited, `[2]` will never be closed because:
- due to the way the infra was created, it was never added to the
`ToClose` slice (so it will not be closed on the flow cleanup)
- `diskSpillerBase [1]` doesn't close its inputs
- `external sorter` nor the in-memory sorter end up closing `[2]` either
because there are other utility operators that don't implement
`Closer` interface between the sorters and `[2]`.
As a result of not closing `[2]`, some disk resources might be leaked.
This commit fixes the issue by making `diskSpillerBase` close all of its
inputs (which is a single input in the case of the disk-backed distinct
and disk-backed hash aggregator). `Close` is allowed to be called
multiple time, so it is ok if there happen to be other codepaths
calling it.
Release note: None
jbowens
left a comment
There was a problem hiding this comment.
Thanks for the quick follow up!
Reviewed 1 of 5 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jbowens)
michae2
left a comment
There was a problem hiding this comment.
Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)
pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):
if c, ok := input.(colexecop.Closer); ok { if err := c.Close(ctx); err != nil { retErr = err
Why not use use errors.CombineErrors here? (And in the other Close methods.)
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @michae2)
pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Why not use use
errors.CombineErrorshere? (And in the other Close methods.)
Probably for no strong reason - this error is eventually just logged (see CloseAndLogOnErr in colexecop/operator.go), the errors here are extremely unlikely to occur in the first place, maybe the method was introduced when errors.CombineErrors was not used widely.
Still, since I want this to be backported, I'm inclined to keep returning the last error for consistency with other implementations.
michae2
left a comment
There was a problem hiding this comment.
Reviewed 3 of 5 files at r1.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @cucaroach)
pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Probably for no strong reason - this error is eventually just logged (see
CloseAndLogOnErrincolexecop/operator.go), the errors here are extremely unlikely to occur in the first place, maybe the method was introduced whenerrors.CombineErrorswas not used widely.Still, since I want this to be backported, I'm inclined to keep returning the last error for consistency with other implementations.
Ok, that's fine.
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 7710cb4 to blathers/backport-release-21.2-81419: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. error creating merge commit from 7710cb4 to blathers/backport-release-22.1-81419: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Previously, it was possible to not release some resources when external
distinct or external hash aggregator short circuit their execution
(either because of an error or because of the LIMIT on the query) in
some cases (namely, when a sort is planned on top of the external
operation to restore the desired ordering). This was the case because
the hash-based partitioner (which abstracts away the disk-backed
algorithm) wasn't added to
OpWithMetaInfo.ToCloseslice since there isa sort on top of it nor was it closed by that sort.
Here is an example diagram for all the infra that is set up for the
disk-backed distinct when ordering needs to be maintained:
In this diagram,
hash-based partitioner [2]is the external distinctthat is the input to the
diskSpillerBase [1]. In the happy path (when[2]is exhausted), it isClosed automatically. However, if itsexecution is short-circuited,
[2]will never be closed because:ToCloseslice (so it will not be closed on the flow cleanup)diskSpillerBase [1]doesn't close its inputsexternal sorternor the in-memory sorter end up closing[2]eitherbecause there are other utility operators that don't implement
Closerinterface between the sorters and[2].As a result of not closing
[2], some disk resources might be leaked.This commit fixes the issue by making
diskSpillerBaseclose all of itsinputs (which is a single input in the case of the disk-backed distinct
and disk-backed hash aggregator).
Closeis allowed to be calledmultiple time, so it is ok if there happen to be other codepaths
calling it.
Fixes: #81413.
Release note: None