Skip to content

feat: Expose on_batch_complete via create method#663

Merged
mikeknep merged 3 commits into
mainfrom
662-on-batch-complete/mknepper
May 19, 2026
Merged

feat: Expose on_batch_complete via create method#663
mikeknep merged 3 commits into
mainfrom
662-on-batch-complete/mknepper

Conversation

@mikeknep

Copy link
Copy Markdown
Contributor

📋 Summary

Exposes on_batch_complete via the DataDesigner.create method so that users can configure the callback without having to reach into engine internals.

🔗 Related Issue

Closes #662 #662

🔄 Changes

  • Adds an optional argument to DataDesigner.create

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable) N/A

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) N/A

@mikeknep mikeknep requested a review from a team as a code owner May 15, 2026 14:11
@greptile-apps

greptile-apps Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR exposes on_batch_complete as an optional parameter on DataDesigner.create, allowing callers to register a per-batch callback without accessing engine internals directly. The change is a thin pass-through: the new parameter is collected at the create boundary and forwarded unchanged to builder.build().

  • data_designer.py: Adds on_batch_complete: Callable[[Path], None] | None = None to create, updates the docstring, and inserts the argument into the builder.build() call.
  • test_data_designer.py: Adds a focused unit test that mocks the builder and asserts the callback is forwarded by identity to builder.build().

Confidence Score: 5/5

The change is a straightforward parameter pass-through with no new logic introduced; safe to merge.

The new parameter is optional, defaults to None, and is forwarded verbatim to an existing builder.build() argument that already handles the callback. No existing call sites are affected, and the unit test confirms end-to-end forwarding by identity.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/interface/data_designer.py Adds on_batch_complete as an optional keyword argument to DataDesigner.create, forwarding it directly to builder.build(); import of Callable added, docstring updated.
packages/data-designer/tests/interface/test_data_designer.py Adds test_create_forwards_on_batch_complete_callback which mocks internal builder methods and verifies the callback is passed through to builder.build() via keyword argument.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant DataDesigner
    participant DatasetBuilder

    Caller->>DataDesigner: "create(config_builder, on_batch_complete=cb)"
    DataDesigner->>DataDesigner: _create_resource_provider(...)
    DataDesigner->>DataDesigner: _create_dataset_builder(...)
    DataDesigner->>DatasetBuilder: "build(num_records=N, on_batch_complete=cb, resume=mode)"
    loop Each batch
        DatasetBuilder-->>Caller: cb(batch_artifact_path)
    end
    DatasetBuilder-->>DataDesigner: (complete)
    DataDesigner-->>Caller: DatasetCreationResults
Loading

Reviews (4): Last reviewed commit: "Add note about exceptions to docstring" | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown
Contributor

PR #663 Review — Expose on_batch_complete via DataDesigner.create

Summary

Threads an existing engine-level on_batch_complete: Callable[[Path], None] | None parameter
out through the public DataDesigner.create method on the interface package, so users can
register a per-batch callback without reaching into engine internals. Closes #662.

The change is minimal (47/-1) and surgical:

  • packages/data-designer/src/data_designer/interface/data_designer.py: add the kwarg and
    forward it to builder.build(...).
  • packages/data-designer/tests/interface/test_data_designer.py: a unit test asserting the
    callback is forwarded with identity to the underlying builder.

Findings

Correctness

  • Signature matches the engine: dataset_builder.py:223 declares
    on_batch_complete: Callable[[Path], None] | None = None, identical to the new
    interface kwarg. The forwarding call uses keyword args, so positional drift is
    not a risk.
  • Default None preserves existing behavior — a pure additive change with no
    caller impact.
  • Callable is imported from collections.abc (the project convention) rather
    than typing, consistent with from __future__ import annotations already in
    use in this file.

Project conventions

  • Follows the layering rules in AGENTS.md: interface forwards into engine; no
    reverse imports. ✅
  • Type annotations and modern syntax (Callable[[Path], None] | None) align with
    STYLEGUIDE expectations. ✅
  • Docstring entry added in the Args: section, matching the existing style of
    surrounding parameters.

Test coverage

  • test_create_forwards_on_batch_complete_callback asserts identity (is) on the
    forwarded callback and the num_records kwarg — a tight, focused check. Good.
  • The test only exercises the wiring; behavior of the callback (invocation per
    batch, path correctness) remains covered by engine-level tests. That separation
    of concerns is appropriate for an interface-layer change.
  • Minor nit: the dummy callback uses del path to silence "unused argument"
    lint, which is fine but def on_batch_complete(_path: Path) -> None: pass
    would be more idiomatic. Not blocking.

Risk / blast radius

  • Backward compatible: new kwarg with a default of None.
  • No public-API break; no changes to engine, config, or CLI.
  • No performance implications — the callback is only invoked on batch completion
    in the engine; this PR doesn't add any per-record overhead.

Documentation

  • Public-facing docstring is updated. Consider whether the Fern docs site
    (fern/) has an example or API reference that should mention
    on_batch_complete on create(). If create() is auto-generated from the
    signature, no action needed; otherwise a short note + tiny example would help
    users discover the feature. Not blocking.

Verdict

LGTM. Small, well-scoped, test-covered, backward compatible, and consistent with
the layering invariants in AGENTS.md. The only optional follow-up is verifying
the docs site surfaces the new parameter; everything else is good as-is.

Comment thread packages/data-designer/src/data_designer/interface/data_designer.py Outdated
Comment thread packages/data-designer/src/data_designer/interface/data_designer.py
johnnygreco
johnnygreco previously approved these changes May 19, 2026
@mikeknep mikeknep force-pushed the 662-on-batch-complete/mknepper branch from 3e0ab73 to bfe365a Compare May 19, 2026 14:04
@mikeknep mikeknep merged commit 498e627 into main May 19, 2026
49 checks passed
@mikeknep mikeknep deleted the 662-on-batch-complete/mknepper branch May 19, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose on_batch_complete through high-level interface

3 participants