Skip to content

Conversation

@jdrouet
Copy link
Contributor

@jdrouet jdrouet commented May 26, 2025

Which issue does this PR close?

Rationale for this change

Proposing a builder pattern for the disk managers.

What changes are included in this PR?

In order not to have a breaking change, I decided not to update the current config structure, but instead, create a builder that would be an alternative and would allow to customize the disk manager, while being easily extendible.

Are these changes tested?

It's been used in the existing tests and the CLI

Are there any user-facing changes?

Introducing a builder structure.

Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
@github-actions github-actions bot added the execution Related to the execution crate label May 26, 2025
jdrouet added 2 commits May 26, 2025 12:01
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
@github-actions github-actions bot added the core Core DataFusion crate label May 26, 2025
jdrouet and others added 3 commits May 26, 2025 13:32
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
@jdrouet jdrouet marked this pull request as ready for review May 26, 2025 12:08
@jdrouet
Copy link
Contributor Author

jdrouet commented May 26, 2025

Correct me if I'm wrong, but the failing test doesn't seem related.

Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdrouet -- looks great to me

I do think we should deprecate DiskManagerConfig -- as described on https://datafusion.apache.org/contributor-guide/api-health.html if possible, but we can do that as a follow on PR

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you. I also think it's great to deprecating the old APIs.

jdrouet added 2 commits May 28, 2025 10:37
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
"Created local dirs {local_dirs:?} as DataFusion working directory"
);
Ok(DiskManager {
local_dirs: Mutex::new(Some(local_dirs)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdrouet for the work, LGTM, and minor question, do we have each dir max limit config when we have multi dirs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just moved the existing to a builder. So right now, we have the same behavior as before, meaning that each dir has the same limit.
For more details, you'd have to go back to the current_file_disk_usage computation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks , got it!

Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com>
@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label May 28, 2025
jdrouet and others added 2 commits May 28, 2025 14:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jdrouet

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the change will match our codebase more, most of places are using Builder style

@xudong963 xudong963 merged commit 5944e8b into apache:main May 29, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate execution Related to the execution crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make DiskManagerBuilder to construct DiskManagers

5 participants