Skip to content

fix(interface): don't leak YAML default provider into user-supplied list#591

Merged
nabinchha merged 2 commits into
mainfrom
nmulepati/fix/588-default-provider-leak
Apr 30, 2026
Merged

fix(interface): don't leak YAML default provider into user-supplied list#591
nabinchha merged 2 commits into
mainfrom
nmulepati/fix/588-default-provider-leak

Conversation

@nabinchha

Copy link
Copy Markdown
Contributor

Summary

Fix for #588.

`DataDesigner.init` was always reading the `default:` key from `~/.data-designer/model_providers.yaml` and applying it to the runtime `ModelProviderRegistry` — even when the caller supplied their own `model_providers` list. That caused:

  1. Hard failure — when the YAML default named a provider absent from the supplied list, construction raised `ValidationError: Specified default 'X' not found in providers list`.
  2. Silent override — when the YAML default matched a non-first user-supplied provider, the documented "first wins" ordering was silently overridden.

This PR gates the YAML lookup on `model_providers is None` so a user-supplied list owns its own default. It also exposes `model_provider_registry` and `run_config` as public read-only properties on `DataDesigner`, paired with the existing `secret_resolver` property and `set_run_config` setter.

Changes

  • `DataDesigner.init` only consults `get_default_provider_name()` when falling back to the YAML's `providers:` list.
  • New `model_provider_registry` `@property` returning the resolved `ModelProviderRegistry`.
  • New `run_config` `@property` returning the active `RunConfig`.
  • Two new regression tests:
    • `test_init_user_supplied_providers_ignore_unrelated_yaml_default` (pins repro 1).
    • `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default` (pins repro 2).
  • Two existing jinja-rendering tests had a `patch.object(dd_mod, "get_default_provider_name", ...)` workaround that existed only because of this bug. Workarounds removed; tests still pass.
  • The pre-existing `test_run_config_*` tests now read `data_designer.run_config` instead of `data_designer._run_config`.

No public-API removal or breaking changes — only the two new `@property` additions.

Test plan

  • `make check-all-fix` clean.
  • `make test` — 648 passed.
  • Targeted: `pytest packages/data-designer/tests/interface/test_data_designer.py -k "run_config or init or default_provider or jinja_rendering"` — 12 passed.
  • New tests fail on `main` (verified by reading the bug repros in the issue).

Made with Cursor

When a caller passes ``model_providers`` to ``DataDesigner.__init__``, the
YAML's ``default:`` key from ``~/.data-designer/model_providers.yaml`` was
still being applied to the resulting ``ModelProviderRegistry``. This caused
two related problems:

1. Hard failure: if the YAML default named a provider absent from the
   user-supplied list, construction raised
   ``ValidationError: Specified default 'X' not found in providers list``.
2. Silent override: if the YAML default matched a non-first user-supplied
   provider, the documented "first wins" behavior was silently overridden.

Gate the YAML lookup on ``model_providers is None`` so that user-supplied
lists own their own default. Also expose ``model_provider_registry`` and
``run_config`` as public read-only properties on ``DataDesigner``, paired
with the existing ``secret_resolver`` property and ``set_run_config``
setter; tests now use these instead of the underscore-prefixed attributes.

Closes #588.

Made-with: Cursor
@nabinchha nabinchha requested a review from a team as a code owner April 30, 2026 21:26
@greptile-apps

greptile-apps Bot commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a bug where DataDesigner.__init__ always consulted get_default_provider_name() from the YAML config — even when the caller supplied their own model_providers list — causing either a hard ValidationError or a silent override of the documented "first wins" ordering. The fix gates the YAML default lookup on model_providers is None, adds public model_provider_registry and run_config properties, removes a now-unnecessary test workaround, and adds three focused regression tests.

Confidence Score: 5/5

Safe to merge — fix is minimal, correctly scoped, and backed by targeted regression tests.

The change is a single well-understood conditional in __init__ with no new external dependencies. Both failure modes from the bug report are covered by regression tests, the YAML-fallback path is also pinned, and all existing tests were updated to use the new public property without any logic changes.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/interface/data_designer.py Core fix: gates YAML default provider lookup on model_providers is None; adds model_provider_registry and run_config public properties — logic is correct and well-commented.
packages/data-designer/tests/interface/test_data_designer.py Three new regression tests pin both failure modes from #588 and the unchanged YAML-fallback path; existing tests updated to use public run_config property instead of _run_config.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DataDesigner.__init__ called] --> B{model_providers is None?}
    B -- Yes --> C[resolve from YAML providers list]
    C --> D[get_default_provider_name from YAML]
    D --> E[resolve_model_provider_registry with YAML default]
    B -- No --> F[use caller-supplied providers list]
    F --> G[default_provider_name = None, first wins]
    G --> H[resolve_model_provider_registry, default=None]
    E --> I[ModelProviderRegistry ready]
    H --> I
Loading

Reviews (2): Last reviewed commit: "fix(interface): tighten model_provider_r..." | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown
Contributor

Code Review: PR #591fix(interface): don't leak YAML default provider into user-supplied list

Summary

The PR fixes issue #588: DataDesigner.__init__ was always consulting the default: key from ~/.data-designer/model_providers.yaml and passing it to resolve_model_provider_registry, even when the caller supplied their own model_providers list. That produced two failure modes:

  1. Hard fail — when the YAML default: named a provider not in the user's list, pydantic validation raised ValidationError.
  2. Silent override — when the YAML default: matched a non-first user-supplied provider, the documented "first wins" ordering was silently flipped.

The fix gates get_default_provider_name() on model_providers is None so the YAML default only applies when also falling back to the YAML providers: list. resolve_model_provider_registry already handles default_provider_name=None by falling back to model_providers[0].name (packages/data-designer-engine/src/data_designer/engine/model_provider.py:77), so the first-wins contract is preserved for user-supplied lists.

The PR also promotes _model_provider_registry and _run_config to public read-only @propertys, and updates existing tests to consume the public properties rather than the underscore-prefixed attributes.

Findings

Correctness — ✅ fix is sound

  • The branching in __init__ correctly isolates the two cases: YAML fallback gets the YAML default; user-supplied list gets None, which resolve_model_provider_registry converts into model_providers[0].name. Matches the PR description and the engine-side helper.
  • The two new regression tests (test_init_user_supplied_providers_ignore_unrelated_yaml_default, test_init_user_supplied_providers_preserve_first_wins_over_yaml_default) pin both repros from bug: default model provider leaks from YAML into DataDesigner constructor #588 directly and assert against model_provider_registry.get_default_provider_name() — a faithful reproduction of the bug conditions.
  • Removing the patch.object(dd_mod, "get_default_provider_name", ...) workaround from the two jinja-rendering tests is correct: those patches only existed because an unrelated YAML default on the test host could cause construction to fail, which is exactly what this PR fixes. Removing them is appropriate cleanup, not test-scope creep.

Test coverage — ✅ adequate

  • Both bug modes are covered (unrelated-name → hard fail; matching-non-first-name → silent override). The first-wins assertion in the second test is the important part.
  • The test_run_config_* churn is a pure rename (._run_config.run_config) and flows naturally from exposing the new property.
  • Not covered but arguably could be: a negative test that confirms the YAML default: does still apply when model_providers is None (the happy fallback path). Not a blocker — that path is exercised implicitly elsewhere, and the PR is scoped narrowly.

Public API surface — minor consideration

Exposing model_provider_registry: ModelProviderRegistry makes the engine-layer type ModelProviderRegistry part of the interface package's public contract. The interface already re-exports engine types in other places (e.g., the model facade), so this is consistent with existing patterns, but worth flagging as an intentional widening of the public surface. The docstring is accurate about how default is resolved.

The run_config property is symmetric with the existing set_run_config setter, so exposing it as a read-only property is the right shape. The docstring correctly notes that RunConfig normalizes some fields on construction (e.g., shutdown_error_rate1.0 when disable_early_shutdown=True), which matches the behavior exercised in test_run_config_normalizes_error_rate_when_disabled.

Style / project conventions — ✅ clean

  • Absolute imports, from __future__ import annotations in effect, modern type syntax (list[ModelProvider] | None) — all consistent with AGENTS.md / STYLEGUIDE.md.
  • The comment inside __init__ explaining why the branch exists, with a link to bug: default model provider leaks from YAML into DataDesigner constructor #588, is exactly the kind of non-obvious-why comment the project encourages.
  • No relative imports introduced; no heavy third-party import added; the new ModelProviderRegistry import is already part of the engine dependency flow (interface → engine).

Minor nits (non-blocking)

  • _resolve_model_providers signature is now slightly awkward. It accepts list[ModelProvider] | None, but the caller in __init__ now branches on None before the call and passes either None or the actual list. The inner method still has the if model_providers is None: branch and a stray return model_providers or [] at the end that also handles None. The logic still works, but the None-handling is now split across caller and callee. A tiny follow-up could push the branch entirely into __init__ (since the caller already knows which case it's in) and tighten _resolve_model_providers to list[ModelProvider] | None → list[ModelProvider] with a cleaner single path. Not in scope for this PR.
  • The old resolve_model_provider_registry(..., get_default_provider_name()) one-liner was re-formatted into a two-step assignment. The new form is clearer given the branching; no concern.
  • Commit/PR description is thorough and calls out that new tests fail on main — good discipline.

Risks

  • Low. The change is narrowly scoped, the bug is well-characterized, and the regression tests directly pin both repros. No behavior change for the YAML-fallback path (the only path that still reads get_default_provider_name()).
  • No security implications — no new network, filesystem, or deserialization surface. The YAML file is still read only under the existing fallback path.
  • No performance implications.

Verdict

Approve with no required changes. The fix is small, correct, well-tested, and addresses both failure modes described in #588. The API additions (model_provider_registry, run_config properties) are reasonable and symmetric with existing members. The removal of the two patch.object workarounds is appropriate now that the underlying bug is gone.

The only optional follow-up worth considering is tightening _resolve_model_providers so its None-handling isn't split across the caller and callee — but that's cosmetic and belongs in a separate change.

@johnnygreco

Copy link
Copy Markdown
Contributor

Thanks for the thorough write-up on this one, @nabinchha — the issue analysis in #588 made the fix easy to verify against intent.

Summary

The PR gates the YAML default: lookup on model_providers is None, so a user-supplied list owns its own first-wins default and never trips ModelProviderRegistry's default ∈ providers validator with an unrelated name from ~/.data-designer/model_providers.yaml. It also exposes model_provider_registry and run_config as read-only properties so tests (and external callers) don't have to reach into private attributes. The implementation matches the suggested fix in #588 exactly.

Findings

Suggestions — Take it or leave it

packages/data-designer/src/data_designer/interface/data_designer.py:436-446model_provider_registry docstring is slightly imprecise

  • What: The "Returns" section says the default comes "from the YAML's default: key (when falling back to the on-disk providers list)". When the YAML has no default: key, the registry actually falls back to providers[0].name of the YAML list (via resolve_model_provider_registry), not the YAML default.
  • Why: Minor — the fallback chain is "user-supplied first → user-supplied[0] → YAML default → YAML[0]", and the current wording elides the last step.
  • Suggestion: Tighten to e.g. "from the YAML's default: key (when set), otherwise the first provider in the list."

packages/data-designer/src/data_designer/interface/data_designer.py:159-166 — branches could collapse

  • What: Both branches call self._resolve_model_providers(model_providers) (passing None works fine in the user-supplied branch since _resolve_model_providers already handles None).
  • Why: Mild duplication; not a real cost — KISS arguably favors the explicit form you have.
  • Suggestion (optional):
    default_provider_name = get_default_provider_name() if model_providers is None else None
    self._model_providers = self._resolve_model_providers(model_providers)
    Take it or leave it — the current form reads clearly with the inline comment.

packages/data-designer/tests/interface/test_data_designer.py — no test pins the YAML-default-applies path

  • What: The two new tests both cover the user-supplied branch. There's no regression test that asserts the intended behavior in the fallback branch — i.e., when model_providers is None and the YAML has default: X, the registry's default is X.
  • Why: The fix preserves that path correctly today, but a future refactor could regress it silently. A small parallel test (mock the loader to return a 2-provider list with default pointing at the second) would lock that contract in.
  • Suggestion: Optional — add a third test like test_init_no_user_providers_uses_yaml_default. Skip if you'd rather keep the PR minimal.

packages/data-designer/src/data_designer/interface/data_designer.py:449-459 — asymmetric run_config API

  • What: We now have a run_config getter property and a set_run_config(...) method. The Pythonic pairing would be @run_config.setter, but that would change the public API.
  • Why: Not for this PR, but worth flagging so a future cleanup can land alongside the #589/#590 work the issue mentions.
  • Suggestion: Consider tracking the cleanup as a follow-up issue rather than touching it here.

What Looks Good

  • The diagnosis trail is exemplary. The issue narrows the bug to specific source lines in three packages, lays out two concrete repros, and even points out the existing test-side patches that exist because of the bug — the PR then deletes those workarounds, which is a satisfying signal that the fix matches the cause.
  • Minimal, surgical change. Two small edits to __init__, one inline comment that explains the why, two regression tests, two workaround removals. No drive-by refactors.
  • Both failure modes are pinned. test_init_user_supplied_providers_ignore_unrelated_yaml_default covers the hard-fail path and test_init_user_supplied_providers_preserve_first_wins_over_yaml_default covers the silent-override path. The patches on get_default_provider_name are necessary (without them, the tests would be flaky regression guards on a clean machine where YAML has no default).
  • Bonus robustness: by gating on model_providers is None, callers who supply their own providers no longer depend on the YAML file existing or being well-formed — the _get_default_providers_file_content FileNotFoundError path can no longer bite them.

Verdict

Ship it (with nits). Fix is correct, minimal, and well-tested. The suggestions above are all optional polish.


This review was generated by an AI assistant.

johnnygreco
johnnygreco previously approved these changes Apr 30, 2026
…allback path

Address review feedback on PR #591:
- Clarify ``model_provider_registry`` docstring so it reflects the full
  fallback chain: user-supplied first → YAML default (when set) → first
  provider in the YAML list.
- Add ``test_init_no_user_providers_uses_yaml_default`` to lock the
  YAML-fallback contract that the #588 fix preserved but didn't pin.

Made-with: Cursor
@nabinchha

Copy link
Copy Markdown
Contributor Author

Thanks for the review @johnnygreco — pushed 5a9fc27 addressing the two suggestions I took:

  • Docstring tightening (suggestion 1) — `model_provider_registry` now spells out the full fallback chain: user-supplied first → YAML `default:` (when set) → first provider in the YAML list.
  • YAML-fallback pin test (suggestion 3) — added `test_init_no_user_providers_uses_yaml_default` so the unchanged-by-this-PR fallback contract is explicitly locked in. Future refactors of `init` can't silently regress it.

Skipped:

@nabinchha nabinchha requested a review from johnnygreco April 30, 2026 21:48
@nabinchha nabinchha merged commit 5ec0e89 into main Apr 30, 2026
49 checks passed
@nabinchha nabinchha deleted the nmulepati/fix/588-default-provider-leak branch April 30, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants