-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: add RESET statement for configuration variabless #18408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Omega359
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR, I've left a few comments
| self.return_empty_dataframe() | ||
| } | ||
|
|
||
| async fn reset_variable(&self, stmt: ResetVariable) -> Result<DataFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to always return an empty dataframe in this function. Is that really necessary vs Result<()> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
datafusion/common/src/config.rs
Outdated
| } | ||
|
|
||
| /// Reset a configuration option back to its default value | ||
| pub fn reset(&mut self, key: &str) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall it might be better to add a reset fn to ConfigField and update the config_namespace! macro rather than implement it like this. Just my opinion..
|
One additional thing - I didn't see anything in this PR for the unparser. It's possible that the set/unset isn't supported (and/or shouldn't be supported) there but could you take a look and see? |
9a18356 to
9cbfb17
Compare
@Omega359 Thanks for flagging it. I double-checked the unparser: today it simply funnels |
645e40b to
38a27d5
Compare
|
Thanks for continuing to work on this - I'll try and do an in-depth review tomorrow for this @Weijun-H. |
Omega359
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good after a quick review. My biggest concern is the ability to reset whole sections of config - I am uncertain we want to support that.
| } | ||
| _ => _config_err!( | ||
| "Config value \"{}\" not found on ConfigFileEncryptionProperties", | ||
| "Config value \"{}\" not found on ConfigFileDecryptionProperties", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
| SET datafusion.runtime.memory_limit = '1M' | ||
|
|
||
| statement ok | ||
| RESET datafusion.runtime.memory_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the show to verify it was indeed reset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the SHOW statement does not display runtime settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file #18452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hack way to test, we can ensure the below query succeed after resetting.
> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.
> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool
datafusion/common/src/config.rs
Outdated
| match section { | ||
| "catalog" => { | ||
| if rem.is_empty() { | ||
| self.catalog = CatalogOptions::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncertain that this should be supported to be honest. I think that a full configuration key should be required for reset.
| } | ||
|
|
||
| fn reset(&mut self, key: &str) -> $crate::error::Result<()> { | ||
| if key.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems off. Why error if there is a key but don't if there isn't? Seems backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalar config fields the caller already strips off the key for the field before invoking reset. At that point key only contains a nested suffix (if any).
- key.is_empty() → we’re exactly at the scalar value, so resetting means “restore to default”, which we do.
- non-empty key → someone tried to reset something below a scalar (e.g. foo.bar where foo is a scalar); that’s invalid, so we emit the helpful error.
datafusion/common/src/config.rs
Outdated
| Ok(()) | ||
| } else { | ||
| $crate::error::_config_err!( | ||
| "Config value \"{}\" not found on {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message doesn't make sense in this context. We also should have a test to cover this error
| } | ||
| "max_temp_directory_size" => { | ||
| // default is 100 GB | ||
| const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 = 100 * 1024 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this const in disk_manager.rs, could we not just use that? Might need to change to pub(crate) though...
| "metadata_cache_limit" => { | ||
| // default is 50 MB | ||
| const DEFAULT_METADATA_CACHE_LIMIT: usize = 50 * 1024 * 1024; | ||
| builder = builder.with_metadata_cache_limit(DEFAULT_METADATA_CACHE_LIMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here in cache_manager.rs
|
Small suggestions, but otherwise LGTM, thanks! |
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on it! I haven't review the implementations yet, I'll continue soon.
| SET datafusion.runtime.memory_limit = '1M' | ||
|
|
||
| statement ok | ||
| RESET datafusion.runtime.memory_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a hack way to test, we can ensure the below query succeed after resetting.
> set datafusion.runtime.memory_limit = '1K';
0 row(s) fetched.
Elapsed 0.001 seconds.
> explain analyze select * from generate_series(1000) as t1(v1) order by v1;
Not enough memory to continue external sort. Consider increasing the memory limit, or decreasing sort_spill_reservation_bytes
caused by
Resources exhausted: Additional allocation failed for ExternalSorterMerge[0] with top memory consumers (across reservations) as:
DataFusion-Cli#89(can spill: false) consumed 0.0 B, peak 0.0 B,
ExternalSorter[0]#90(can spill: true) consumed 0.0 B, peak 0.0 B,
ExternalSorterMerge[0]#91(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 1024.0 B remain available for the total pool
|
|
||
| statement error Arrow error: Parser error: Invalid timezone "Asia/Taipei2": failed to parse timezone | ||
| SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some test coverage for invalid resets like
reset dataexplosion.execution.batch_size, reset datafusion.exec.batch_size, reset datafusion.execution.batches_size and datafusion.execution.batch_size.bar
|
Looks like some tests are failing in CI |
8945228 to
4469cfc
Compare
| state.register_udf(udf)?; | ||
| } | ||
|
|
||
| drop(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you explicitly dropping state here? I can see no reason to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it because it is not needed.
76551fb to
b8ca1cb
Compare
b8ca1cb to
dab3922
Compare
22f83c9 to
66d1b48
Compare
Omega359
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go, thanks for all the work on this!
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a really nice feature, thank you.
| let limit = Self::parse_memory_limit(value)?; | ||
| builder.with_metadata_cache_limit(limit) | ||
| } | ||
| _ => return plan_err!("Unknown runtime configuration: {variable}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| _ => return plan_err!("Unknown runtime configuration: {variable}"), | |
| // Remember to update `reset_runtime_variable()` when adding new options |
| } | ||
|
|
||
| /// Parse a SQL `RESET` | ||
| pub fn parse_reset(&mut self) -> Result<Statement, DataFusionError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious -- why for certain statements we decide to implement in-house parsing, instead of using sqlparser dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlparser only supports ALTER ROLE ... RESET now, so we have to implement from datafusion side.
| let variable = object_name_to_string(&variable); | ||
| let mut variable_lower = variable.to_lowercase(); | ||
|
|
||
| if variable_lower == "timezone" || variable_lower == "time.zone" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comment to explain this special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
66d1b48 to
2bfb876
Compare
|
Thanks @Omega359 and @2010YOUY01 for reviewing ! |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#18384 ## Rationale for this change Without a SQL-level reset, clients that SET DataFusion options have to rebuild the session to recover defaults <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? - Extended the config macros/traits so every namespace knows how to restore default values - Added the `ResetVariable` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Yes, SQL clients (including the CLI) can issue RESET --------- Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Which issue does this PR close?
Rationale for this change
Without a SQL-level reset, clients that SET DataFusion options have to rebuild the session to recover defaults
What changes are included in this PR?
ResetVariableAre these changes tested?
Yes
Are there any user-facing changes?
Yes, SQL clients (including the CLI) can issue RESET