Skip to content

feat: move buffer size control to RunConfig#209

Merged
eric-tramel merged 1 commit into
mainfrom
ewt/run-config-buffer
Jan 13, 2026
Merged

feat: move buffer size control to RunConfig#209
eric-tramel merged 1 commit into
mainfrom
ewt/run-config-buffer

Conversation

@eric-tramel

Copy link
Copy Markdown
Contributor

Summary

This change moves dataset generation buffer sizing into RunConfig, making runtime execution settings explicit and centralized.

  • Add RunConfig.buffer_size (default: 1000, validated via Field(gt=0)).
  • Make ColumnWiseDatasetBuilder.build(...) read buffer_size from ResourceProvider.run_config (removes the explicit buffer_size argument).
  • Remove DataDesigner’s buffer configuration helper (set_buffer_size) and its internal _buffer_size state.
  • Update unit tests to configure buffer sizing via RunConfig.

Motivation

Previously, buffer sizing lived on DataDesigner via a separate helper method, while other runtime/execution controls (early shutdown, retry limits) lived on RunConfig. That split made it harder for users to discover, reason about, and persist “how generation runs” settings.

By consolidating buffer_size into RunConfig, all runtime knobs that affect execution behavior now live in one place, and the engine consistently reads them from a single source (ResourceProvider.run_config). This clarifies dataset generation settings, makes configs easier to share/reuse, and reduces the chance of configuration drift between interface and engine.

Behavior / impact

  • Default behavior is unchanged: if users do nothing, RunConfig().buffer_size == 1000.
  • buffer_size controls end-to-end batch size during persisted generation: one batch is generated, post-batch processors run, and the batch is written to artifact storage before the next batch begins.
  • Breaking change: DataDesigner.set_buffer_size(...) was removed. Buffer sizing is now configured only via RunConfig.

Usage example

from data_designer.essentials import DataDesigner, RunConfig

dd = DataDesigner(artifact_path="./artifacts")

# Configure runtime execution settings in one place
run_config = RunConfig(
    buffer_size=250,  # records per end-to-end batch (generate -> process -> write)
    shutdown_error_rate=0.3,
    shutdown_error_window=25,
)

dd.set_run_config(run_config)

results = dd.create(config_builder, num_records=10_000)

Tests

  • Updated interface and engine builder tests to validate default + configured buffer_size behavior and to reflect the removed helper API.

@eric-tramel eric-tramel self-assigned this Jan 13, 2026
@eric-tramel eric-tramel added the enhancement New feature or request label Jan 13, 2026
@eric-tramel eric-tramel merged commit 2830fb2 into main Jan 13, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants