Skip to content

fix: repair NGC persona downloads#720

Merged
johnnygreco merged 3 commits into
mainfrom
johnny/fix/ngc-persona-download-config
Jun 1, 2026
Merged

fix: repair NGC persona downloads#720
johnnygreco merged 3 commits into
mainfrom
johnny/fix/ngc-persona-download-config

Conversation

@johnnygreco

Copy link
Copy Markdown
Contributor

📋 Summary

This PR fixes Nemotron-Persona dataset downloads with NGC CLI versions that reject the --org argument. It removes the unsupported flag, requires users to create the default NGC CLI config file, and keeps the setup docs aligned with the new behavior.

🔗 Related Issue

Closes #708

cc @3mei

🔄 Changes

  • Remove --org nvidia from the ngc registry resource download-version command.
  • Add an explicit missing-config error that tells users to run ngc config set.
  • Preflight the default NGC config before prompting for persona dataset downloads.
  • Update persona sampling docs in both Fern and legacy docs to describe the ngc config set setup path.
  • Add tests for the updated NGC command shape and missing-config behavior.

🧪 Testing

  • make test passes
  • Unit tests added/updated
  • E2E tests added/updated (N/A — no E2E coverage for live NGC CLI downloads)

Focused checks run:

uv run --group dev pytest packages/data-designer/tests/cli/services/test_download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py
uv run --group dev ruff check packages/data-designer/src/data_designer/cli/controllers/download_controller.py packages/data-designer/src/data_designer/cli/services/download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py packages/data-designer/tests/cli/services/test_download_service.py
uv run --group dev ruff format --check packages/data-designer/src/data_designer/cli/controllers/download_controller.py packages/data-designer/src/data_designer/cli/services/download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py packages/data-designer/tests/cli/services/test_download_service.py

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A — no architecture changes)

Remove the unsupported NGC --org argument from persona dataset downloads and require the default NGC CLI config before shelling out.

Add coverage for the missing-config path and the updated command shape.

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
Document ngc config set as the setup path for persona downloads so the docs match the CLI's default config requirement.

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
@johnnygreco johnnygreco requested a review from a team as a code owner June 1, 2026 14:43
@johnnygreco johnnygreco requested a review from 3mei June 1, 2026 14:43
@johnnygreco johnnygreco requested a review from mikeknep June 1, 2026 14:43
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

MkDocs preview: https://50603f85.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-720.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes NGC persona dataset downloads by removing the --org nvidia flag (rejected by newer NGC CLI versions), requiring users to create a config via ngc config set, and adding a preflight check that surfaces a clear error before any interactive UI is shown.

  • download_service.py: Drops --org nvidia from the NGC CLI invocation; adds NgcConfigError and ensure_ngc_config_exists(), which checks ~/.ngc/config before any subprocess call. The ngc_config_path is injectable for testing.
  • download_controller.py: Inserts the config preflight immediately after the NGC CLI binary check and well before _determine_locales(), so a user without a config file is told to run ngc config set before the locale-selection prompt appears — not after.
  • Tests & docs: New tests cover the missing-config path at both the service and controller layers (including with locales=None to exercise the interactive branch); both doc files are updated to replace the export NGC_API_KEY step with ngc config set.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to dropping one unsupported CLI flag and adding a fast, filesystem-only preflight check with no side effects.

The config preflight is correctly inserted before _determine_locales(), so the missing config error fires before any interactive prompt. The new test with locales=None explicitly confirms this ordering. The --org nvidia removal is straightforward, and the belt-and-suspenders re-check inside download_persona_dataset is harmless. No logic regressions are visible.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer/src/data_designer/cli/services/download_service.py Removes --org nvidia flag from NGC command, adds NgcConfigError exception and ensure_ngc_config_exists() preflight; ngc_config_path injectable for testing.
packages/data-designer/src/data_designer/cli/controllers/download_controller.py Config preflight (ensure_ngc_config_exists) correctly inserted before _determine_locales(), so the missing config error fires before the interactive locale-selection UI; NgcConfigError also caught in _download_locale for belt-and-suspenders coverage.
packages/data-designer/tests/cli/controllers/test_download_controller.py New test_run_personas_ngc_config_not_available (with locales=None) proves config check fires before select_multiple_with_arrows; test_download_locale_ngc_config_error covers the _download_locale catch path.
packages/data-designer/tests/cli/services/test_download_service.py Adds ngc_config_path fixture so all tests get a valid config by default; new test_download_persona_dataset_missing_ngc_config verifies subprocess is never called when config absent; expected command updated to drop --org nvidia.
docs/concepts/person_sampling.md Replaces export NGC_API_KEY with ngc config set and updates step descriptions to match the new config-file-based flow.
fern/versions/latest/pages/concepts/person_sampling.mdx Mirror of docs/concepts/person_sampling.md changes for the Fern-rendered docs; both files are in sync.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_personas called] --> B{dry_run?}
    B -- yes --> D[_determine_locales]
    B -- no --> C{NGC CLI available?\ncheck_ngc_cli_with_instructions}
    C -- no --> Z1[print error & return]
    C -- yes --> E{~/.ngc/config exists?\nensure_ngc_config_exists}
    E -- no --> Z2[print NgcConfigError & return]
    E -- yes --> D
    D --> F{locales specified?}
    F -- yes --> G[validate locales]
    F -- no / interactive --> H[select_multiple_with_arrows]
    G --> I[confirm & download loop]
    H --> I
    I --> J{_download_locale}
    J --> K[service.download_persona_dataset]
    K --> L{ensure_ngc_config_exists redundant guard}
    L -- missing --> Z3[NgcConfigError → return False]
    L -- ok --> M[ngc registry resource download-version]
    M -- success --> N[move parquet files to managed-assets]
    M -- CalledProcessError --> Z4[return False]
Loading

Reviews (2): Last reviewed commit: "fix persona download preflight order" | Re-trigger Greptile

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Code Review: PR #720 — fix: repair NGC persona downloads

Summary

This PR repairs the data-designer download personas workflow against current versions of the NGC CLI, which reject the --org flag. The fix has three parts:

  1. Drop the unsupported flag — the ngc registry resource download-version invocation no longer passes --org nvidia.
  2. Require an NGC config file — both the controller (preflight) and the service (defense-in-depth) verify ~/.ngc/config exists before invoking the CLI, raising a new NgcConfigError with an actionable message pointing users to ngc config set.
  3. Update setup docs — both the legacy docs/ site and the Fern docs are revised so Step 1 is "create the default NGC config" instead of "export NGC_API_KEY".

Test coverage is added for the new command shape, the missing-config preflight, the controller-level error path, and the service-level error path.

Findings

Correctness

  • --org removal is the right fix. The remaining command (ngc registry resource download-version <name> --dest <dir>) matches current NGC CLI usage; the resource path itself encodes the org segment (nvidia/...).
  • Config check uses is_file() (download_service.py:45), which correctly rejects directories or symlinks-to-missing-files. Good choice over exists().
  • Default path uses Path.home() / ".ngc" / "config" which matches the documented NGC CLI behavior on Linux/macOS. On Windows the NGC CLI also writes to %USERPROFILE%\.ngc\config, so Path.home() is correct cross-platform.
  • Exception ordering in _download_locale (download_controller.py:190-212) is fine: subprocess.CalledProcessError and NgcConfigError are unrelated branches of the exception tree, and the bare Exception catches anything else.

Design / UX

  • Preflight ordering — minor UX nit. In run_personas, ensure_ngc_config_exists() is called after _determine_locales() (line 71-82). When the user has no --locales/--all flag, _determine_locales triggers an interactive multi-select. A user who spends time picking locales only to be told "NGC config missing" gets frustrated. Moving the preflight to right after the check_ngc_cli_with_instructions() block (line 68) would fail faster. Not blocking, but worth a small move.
  • Redundant check is fine. ensure_ngc_config_exists() runs in both the controller and the service — a deliberate belt-and-suspenders pattern. The service-level call protects callers who use DownloadService directly (e.g., future programmatic usage), while the controller-level call gives a friendlier UX (no traceback). Keep both.
  • Class-level annotations in DownloadService (download_service.py:22-25) match the style guide's "annotate class attributes" rule, and the modern Path | None default is correct (Path("") would be falsy under a naive or expression, so the explicit is not None check is right).

Tests

  • Good coverage. The new tests exercise:
    • Service __init__ records ngc_config_path (test_download_service.py:test_init).
    • Service raises NgcConfigError and does not invoke subprocess or create directories when config is missing (test_download_persona_dataset_missing_ngc_config). The assert not service.managed_assets_dir.exists() is a nice check that the preflight runs before mkdir.
    • Command shape no longer contains --org (test_download_persona_dataset_success).
    • Controller short-circuits before selection/confirmation when config is missing (test_run_personas_ngc_config_not_available).
    • Controller's _download_locale returns False on NgcConfigError (test_download_locale_ngc_config_error).
  • One weak assertion. test_run_personas_ngc_config_not_available asserts mock_select.assert_not_called(), but the test passes locales=["en_US"] so the selector wouldn't be called regardless of the config check. Either drop that assertion or restructure the test to call without locales to actually verify the early-exit prevents interactive selection. Minor.
  • Untested seam. The redundant service-level ensure_ngc_config_exists() raised mid-download (after the controller preflight passed — e.g., user deleted config between selection and confirmation) is theoretically reachable but not tested. Acceptable to skip; the failure mode is benign (clean error message, no partial state).

Documentation

  • Both docs/concepts/person_sampling.md and fern/versions/latest/pages/concepts/person_sampling.mdx are kept in sync — good.
  • The new wording is clear and points users at the same ~/.ngc/config path the code checks. Closing the loop between docs and behavior is helpful.
  • check_ngc_cli_with_instructions() step 3 was rewritten from "Following the install instructions to set up the NGC CLI" (which had a grammatical error) to "Run 'ngc config set' to create a default NGC CLI config file" — both clearer and grammatically correct.

Style / Conventions

  • from __future__ import annotations added to the new test files — compliant with the project rule.
  • Absolute imports throughout — compliant with TID.
  • Type annotations on all new functions/attributes — compliant.
  • No relative imports, no leaked third-party exceptions across module boundaries — compliant. NgcConfigError is the canonical project error wrapping the missing-config condition.

Security

  • No secrets are logged. The error message references the config path but never reads the file contents; API keys stored there are not surfaced.
  • subprocess.run(cmd, check=True) is invoked with a list (no shell), so the locale_obj.dataset_name interpolation cannot inject shell metacharacters. Locale objects come from a controlled PersonaRepository, so this is safe.

Verdict

Approve with minor suggestions. The fix is correct, well-tested, and the docs are updated in both locations. Consider:

  1. Move ensure_ngc_config_exists() above _determine_locales so users don't go through interactive selection before learning their config is missing.
  2. Tighten or remove the mock_select.assert_not_called() assertion in test_run_personas_ngc_config_not_available — it's currently testing a path the test setup precludes anyway.

Neither is blocking. Ship it.

Check for the default NGC config before interactive locale selection so users see setup guidance before choosing datasets.

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
@johnnygreco

Copy link
Copy Markdown
Contributor Author

Thanks @greptile-apps, good catch. Addressed in 88be724 by moving the NGC config preflight ahead of locale determination for non-dry-run flows, so the interactive selector is skipped when ~/.ngc/config is missing. I also updated the controller test to use the no-locale interactive path and assert the selector is not called.\n\nRe-ran:\n- uv run --group dev pytest packages/data-designer/tests/cli/services/test_download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py\n- uv run --group dev ruff check packages/data-designer/src/data_designer/cli/controllers/download_controller.py packages/data-designer/src/data_designer/cli/services/download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py packages/data-designer/tests/cli/services/test_download_service.py\n- uv run --group dev ruff format --check packages/data-designer/src/data_designer/cli/controllers/download_controller.py packages/data-designer/src/data_designer/cli/services/download_service.py packages/data-designer/tests/cli/controllers/test_download_controller.py packages/data-designer/tests/cli/services/test_download_service.py

@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

The fix looks correct. ensure_ngc_config_exists() is now called before _determine_locales, so the interactive selector is never reached when ~/.ngc/config is missing, and test_run_personas_ngc_config_not_available validates that with mock_select.assert_not_called().

One minor note: the NgcConfigError catch in _download_locale is now unreachable for the missing-config case via run_personas (the preflight exits first), but it's still a valid defensive guard if download_persona_dataset is called directly — no change needed there.

LGTM, thanks for the quick turnaround on this.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@3mei 3mei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnnygreco do you think we should add more guidance for which values to use when running ngc config set? In the past, I found the ngc setup a bit confusing. Otherwise, this looks good to me

@johnnygreco johnnygreco merged commit c928015 into main Jun 1, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data-designer download personas fails on NGC CLI 3.x

3 participants