Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Oct 31, 2025

Which issue does this PR close?

Rationale for this change

Without a SQL-level reset, clients that SET DataFusion options have to rebuild the session to recover defaults

What changes are included in this PR?

  • Extended the config macros/traits so every namespace knows how to restore default values
  • Added the ResetVariable

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, SQL clients (including the CLI) can issue RESET

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Oct 31, 2025
Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this PR, I've left a few comments

self.return_empty_dataframe()
}

async fn reset_variable(&self, stmt: ResetVariable) -> Result<DataFrame> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to always return an empty dataframe in this function. Is that really necessary vs Result<()> ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

/// Reset a configuration option back to its default value
pub fn reset(&mut self, key: &str) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall it might be better to add a reset fn to ConfigField and update the config_namespace! macro rather than implement it like this. Just my opinion..

@Omega359
Copy link
Contributor

One additional thing - I didn't see anything in this PR for the unparser. It's possible that the set/unset isn't supported (and/or shouldn't be supported) there but could you take a look and see?

@Weijun-H
Copy link
Member Author

One additional thing - I didn't see anything in this PR for the unparser. It's possible that the set/unset isn't supported (and/or shouldn't be supported) there but could you take a look and see?

@Omega359 Thanks for flagging it. I double-checked the unparser: today it simply funnels LogicalPlan::Statement through the query path, so SET/RESET plans aren’t rendered back to SQL. Supporting them would also need upstream sqlparser changes, as the AST has no native RESET node. Leaving it out of this PR was intentional given those constraints.

@Weijun-H Weijun-H force-pushed the 18384-reset-config branch 3 times, most recently from 645e40b to 38a27d5 Compare October 31, 2025 23:04
@Weijun-H Weijun-H requested a review from Omega359 November 1, 2025 18:40
@Omega359
Copy link
Contributor

Omega359 commented Nov 1, 2025

Thanks for continuing to work on this - I'll try and do an in-depth review tomorrow for this @Weijun-H.

Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good after a quick review. My biggest concern is the ability to reset whole sections of config - I am uncertain we want to support that.

}
_ => _config_err!(
"Config value \"{}\" not found on ConfigFileEncryptionProperties",
"Config value \"{}\" not found on ConfigFileDecryptionProperties",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

SET datafusion.runtime.memory_limit = '1M'

statement ok
RESET datafusion.runtime.memory_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the show to verify it was indeed reset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the SHOW statement does not display runtime settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file #18452

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a hack way to test, we can ensure the below query succeed after resetting.

> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.

> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
  DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
  ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
  ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool

match section {
"catalog" => {
if rem.is_empty() {
self.catalog = CatalogOptions::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain that this should be supported to be honest. I think that a full configuration key should be required for reset.

}

fn reset(&mut self, key: &str) -> $crate::error::Result<()> {
if key.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems off. Why error if there is a key but don't if there isn't? Seems backwards?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scalar config fields the caller already strips off the key for the field before invoking reset. At that point key only contains a nested suffix (if any).

  • key.is_empty() → we’re exactly at the scalar value, so resetting means “restore to default”, which we do.
  • non-empty key → someone tried to reset something below a scalar (e.g. foo.bar where foo is a scalar); that’s invalid, so we emit the helpful error.

Ok(())
} else {
$crate::error::_config_err!(
"Config value \"{}\" not found on {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message doesn't make sense in this context. We also should have a test to cover this error

@2010YOUY01 2010YOUY01 self-requested a review November 3, 2025 04:19
}
"max_temp_directory_size" => {
// default is 100 GB
const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this const in disk_manager.rs, could we not just use that? Might need to change to pub(crate) though...

"metadata_cache_limit" => {
// default is 50 MB
const DEFAULT_METADATA_CACHE_LIMIT: usize = 50 * 1024 * 1024;
builder = builder.with_metadata_cache_limit(DEFAULT_METADATA_CACHE_LIMIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here in cache_manager.rs

@Omega359
Copy link
Contributor

Omega359 commented Nov 3, 2025

Small suggestions, but otherwise LGTM, thanks!

@Omega359 Omega359 mentioned this pull request Nov 3, 2025
37 tasks
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on it! I haven't review the implementations yet, I'll continue soon.

SET datafusion.runtime.memory_limit = '1M'

statement ok
RESET datafusion.runtime.memory_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a hack way to test, we can ensure the below query succeed after resetting.

> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.

> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
  DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
  ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
  ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool


statement error Arrow error: Parser error: Invalid timezone "Asia/Taipei2": failed to parse timezone
SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ

Copy link
Contributor

@2010YOUY01 2010YOUY01 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some test coverage for invalid resets like
reset dataexplosion.execution.batch_size, reset datafusion.exec.batch_size, reset datafusion.execution.batches_size and datafusion.execution.batch_size.bar

@github-actions github-actions bot added the execution Related to the execution crate label Nov 4, 2025
@Omega359
Copy link
Contributor

Omega359 commented Nov 5, 2025

Looks like some tests are failing in CI

state.register_udf(udf)?;
}

drop(state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you explicitly dropping state here? I can see no reason to do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it because it is not needed.

@Weijun-H Weijun-H force-pushed the 18384-reset-config branch 2 times, most recently from 76551fb to b8ca1cb Compare November 9, 2025 17:49
@Weijun-H Weijun-H requested a review from Omega359 November 11, 2025 21:17
@Weijun-H Weijun-H force-pushed the 18384-reset-config branch 2 times, most recently from 22f83c9 to 66d1b48 Compare November 15, 2025 21:41
Copy link
Contributor

@Omega359 Omega359 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go, thanks for all the work on this!

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a really nice feature, thank you.

let limit = Self::parse_memory_limit(value)?;
builder.with_metadata_cache_limit(limit)
}
_ => return plan_err!("Unknown runtime configuration: {variable}"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => return plan_err!("Unknown runtime configuration: {variable}"),
// Remember to update `reset_runtime_variable()` when adding new options

}

/// Parse a SQL `RESET`
pub fn parse_reset(&mut self) -> Result<Statement, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious -- why for certain statements we decide to implement in-house parsing, instead of using sqlparser dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlparser only supports ALTER ROLE ... RESET now, so we have to implement from datafusion side.

let variable = object_name_to_string(&variable);
let mut variable_lower = variable.to_lowercase();

if variable_lower == "timezone" || variable_lower == "time.zone" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comment to explain this special case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@Weijun-H Weijun-H added this pull request to the merge queue Nov 17, 2025
Merged via the queue into apache:main with commit 71fcd03 Nov 17, 2025
32 checks passed
@Weijun-H
Copy link
Member Author

Thanks @Omega359 and @2010YOUY01 for reviewing !

logan-keede pushed a commit to logan-keede/datafusion that referenced this pull request Nov 23, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18384

## Rationale for this change
Without a SQL-level reset, clients that SET DataFusion options have to
rebuild the session to recover defaults

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
- Extended the config macros/traits so every namespace knows how to
restore default values
- Added the `ResetVariable`

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Yes, SQL clients (including the CLI) can issue RESET

---------

Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate execution Related to the execution crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support resetting config options to defaults

3 participants