Skip to content

colexecdisk: make sure to release resources in all cases#81419

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:fix-closers
May 18, 2022
Merged

colexecdisk: make sure to release resources in all cases#81419
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:fix-closers

Conversation

@yuzefovich
Copy link
Copy Markdown
Member

@yuzefovich yuzefovich commented May 18, 2022

Previously, it was possible to not release some resources when external
distinct or external hash aggregator short circuit their execution
(either because of an error or because of the LIMIT on the query) in
some cases (namely, when a sort is planned on top of the external
operation to restore the desired ordering). This was the case because
the hash-based partitioner (which abstracts away the disk-backed
algorithm) wasn't added to OpWithMetaInfo.ToClose slice since there is
a sort on top of it nor was it closed by that sort.

Here is an example diagram for all the infra that is set up for the
disk-backed distinct when ordering needs to be maintained:

         diskSpillerBase (disk-backed distinct)
           |                      |
UnorderedDistinct     diskSpillerBase [1] (disk-backed sort)
                      |              |                  |
                in-mem sorter   external sorter   hash-based partitioner [2]

In this diagram, hash-based partitioner [2] is the external distinct
that is the input to the diskSpillerBase [1]. In the happy path (when
[2] is exhausted), it is Closed automatically. However, if its
execution is short-circuited, [2] will never be closed because:

  • due to the way the infra was created, it was never added to the
    ToClose slice (so it will not be closed on the flow cleanup)
  • diskSpillerBase [1] doesn't close its inputs
  • external sorter nor the in-memory sorter end up closing [2] either
    because there are other utility operators that don't implement
    Closer interface between the sorters and [2].

As a result of not closing [2], some disk resources might be leaked.
This commit fixes the issue by making diskSpillerBase close all of its
inputs (which is a single input in the case of the disk-backed distinct
and disk-backed hash aggregator). Close is allowed to be called
multiple time, so it is ok if there happen to be other codepaths
calling it.

Fixes: #81413.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Previously, it was possible to not release some resources when external
distinct or external hash aggregator short circuit their execution
(either because of an error or because of the LIMIT on the query) in
some cases (namely, when a sort is planned on top of the external
operation to restore the desired ordering). This was the case because
the hash-based partitioner (which abstracts away the disk-backed
algorithm) wasn't added to `OpWithMetaInfo.ToClose` slice since there is
a sort on top of it nor was it closed by that sort.

Here is an example diagram for all the infra that is set up for the
disk-backed distinct when ordering needs to be maintained:
```
         diskSpillerBase (disk-backed distinct)
           |                      |
UnorderedDistinct     diskSpillerBase [1] (disk-backed sort)
                      |              |                  |
                in-mem sorter   external sorter   hash-based partitioner [2]
```
In this diagram, `hash-based partitioner [2]` is the external distinct
that is the input to the `diskSpillerBase [1]`. In the happy path (when
`[2]` is exhausted), it is `Close`d automatically. However, if its
execution is short-circuited, `[2]` will never be closed because:
- due to the way the infra was created, it was never added to the
`ToClose` slice (so it will not be closed on the flow cleanup)
- `diskSpillerBase [1]` doesn't close its inputs
- `external sorter` nor the in-memory sorter end up closing `[2]` either
because there are other utility operators that don't implement
`Closer` interface between the sorters and `[2]`.

As a result of not closing `[2]`, some disk resources might be leaked.
This commit fixes the issue by making `diskSpillerBase` close all of its
inputs (which is a single input in the case of the disk-backed distinct
and disk-backed hash aggregator). `Close` is allowed to be called
multiple time, so it is ok if there happen to be other codepaths
calling it.

Release note: None
Copy link
Copy Markdown
Contributor

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Thanks for the quick follow up!

Reviewed 1 of 5 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jbowens)

@yuzefovich yuzefovich marked this pull request as ready for review May 18, 2022 16:14
@yuzefovich yuzefovich requested a review from a team as a code owner May 18, 2022 16:14
@yuzefovich yuzefovich requested review from cucaroach and michae2 May 18, 2022 16:15
Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)


pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):

		if c, ok := input.(colexecop.Closer); ok {
			if err := c.Close(ctx); err != nil {
				retErr = err

Why not use use errors.CombineErrors here? (And in the other Close methods.)

Copy link
Copy Markdown
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @michae2)


pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Why not use use errors.CombineErrors here? (And in the other Close methods.)

Probably for no strong reason - this error is eventually just logged (see CloseAndLogOnErr in colexecop/operator.go), the errors here are extremely unlikely to occur in the first place, maybe the method was introduced when errors.CombineErrors was not used widely.

Still, since I want this to be backported, I'm inclined to keep returning the last error for consistency with other implementations.

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 5 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach)


pkg/sql/colexec/colexecdisk/disk_spiller.go line 246 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Probably for no strong reason - this error is eventually just logged (see CloseAndLogOnErr in colexecop/operator.go), the errors here are extremely unlikely to occur in the first place, maybe the method was introduced when errors.CombineErrors was not used widely.

Still, since I want this to be backported, I'm inclined to keep returning the last error for consistency with other implementations.

Ok, that's fine.

Copy link
Copy Markdown
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @cucaroach)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 18, 2022

Build succeeded:

@craig craig bot merged commit 1e1ff14 into cockroachdb:master May 18, 2022
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 18, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 7710cb4 to blathers/backport-release-21.2-81419: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


error creating merge commit from 7710cb4 to blathers/backport-release-22.1-81419: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/colcontainer: TestExternalDistinct leaks Queues, file descriptors

4 participants