feat: Expose shutdown options as RunConfig#186
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
There was a problem hiding this comment.
Pull request overview
This PR exposes auto-shutdown configuration options to the DataDesigner.create() method, allowing users to control or disable the early shutdown feature that terminates dataset generation when error rates exceed a threshold. Previously, these parameters were hardcoded and could cause large-scale jobs to fail due to temporary error spikes.
- Adds three new parameters to
create():enable_early_shutdown,shutdown_error_rate, andshutdown_error_window - Threads these parameters through
_create_dataset_builder()toColumnWiseDatasetBuilder - Implements conditional logic to disable early shutdown by setting
shutdown_error_rate=1.0when disabled
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/data_designer/interface/data_designer.py | Adds three new parameters to create() method and passes them through to dataset builder initialization |
| src/data_designer/engine/dataset_builders/column_wise_builder.py | Accepts and stores shutdown parameters, applies them when creating concurrent thread executor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I have read the DCO document and I hereby sign the DCO. |
johnnygreco
left a comment
There was a problem hiding this comment.
Thanks for exposing these parameters @eric-tramel!
One thing I'm not sure about, though, is whether we want to put these on create just yet. Let's check with @mikeknep on whether there are potential issues with the MS-side, which we want to mirror as much as possible.
If we want to hold off on the create method, we can still expose them with a helper method like we do with the buffer size.
nabinchha
left a comment
There was a problem hiding this comment.
One small nit. Do we also need to expose it for validation generators.
|
Actually, I like @johnnygreco's suggestion to expose public setters for DataDesigner and keep the |
create(...)There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Resolves merge conflicts by combining: - run_config support from this branch (early shutdown control) - seed reader refactoring from main (SeedSource, SeedReaderRegistry) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Okay, I've made some significant changes to the PR to try to address concerns. The |
This is now solved in the PR update. |
Description
This feature exposes early shutdown settings via a new
RunConfigclass. Previously, these parameters were hard-coded, so there was no way to adjust them on a case-by-case basis.Users can now configure
RunConfigand apply it viaset_run_config(). Settings apply to both column generation and validation tasks.Why?
For large-scale generation tasks, streaks of malformed inputs, intermittent backpressure from servers, or bad luck can cause momentarily high error rates. Given the tight hardcoded default window, large jobs can be blocked by unpredictable short runs of errors. Users should have the ability to turn off this feature and accept dropped records without killing entire jobs.
Usage
Closes #185