Comparing changes

* docs: add TL;DR tips to dev note and update install instructions Add a tip block with four key lessons for building agent skills to the "Data Designer Got Skills" dev note. Remove the Claude Code marketplace install option from both the blog and README, keeping only the skills.sh method. Update skill mode descriptions and clarify Claude Code testing scope. * docs: add sentence about CLI delivering curated context * docs: fix stray asterisk in README install instructions * docs: remove claude-plugin marketplace directory

The marketplace plugin structure is no longer needed.

…456) * chore: rename ColumnWiseDatasetBuilder, wire async preview, unify row-group lifecycle - Rename ColumnWiseDatasetBuilder to DatasetBuilder and column_wise_builder.py to dataset_builder.py, update all references - Extract _prepare_async_run() factory shared by build and preview paths - Add _build_async_preview() for async preview with no disk checkpoints - Replace on_row_group_complete/on_checkpoint_complete with single on_finalize_row_group callback; caller handles checkpointing - Add free_row_group() on RowGroupBufferManager for discard-without-write - Free fully-dropped row groups instead of finalizing them - Add consolidated AsyncProgressReporter for async generation logging Closes #437, closes #442, closes #444 * feat: add consolidated async progress reporting with row group context - Add AsyncProgressReporter: groups per-column progress into a single log block emitted at configurable intervals (default 5s) - Add quiet mode to ProgressTracker to suppress per-tracker logging when used with the consolidated reporter - Add ContextVar-based row group tagging (RG1, RG2, ...) for log messages emitted inside async tasks (samplers, expressions, seeds) - Add progress_interval to RunConfig for user-configurable reporting - Remove log_start_async from ProgressTracker (superseded by reporter) Closes #443 * fix async preview and progress reporting * feat: add opt-in sticky ANSI progress bars for generation Add RunConfig.progress_bar setting that replaces periodic log-line progress with sticky terminal bars that stay at the bottom while logs scroll above. Pure ANSI escape codes, no new dependencies. Disabled by default - existing log-based output unchanged. * docs: add missing progress_interval docstring in RunConfig * fix: update progress bar on every completion in async path Skip the time gate when the progress bar is active so the bar redraws on every record instead of every progress_interval seconds. * fix: resolve row-group semaphore deadlock when all tasks are deferred When all tasks for admitted row groups fail with transient errors, the row-group semaphore never releases, blocking admission of new row groups. Fix by salvaging stalled row groups inline - retrying deferred tasks immediately so row groups can checkpoint and free their semaphore slots. Also updates row group log format to (x/X) with leading zeros. * fix: eagerly salvage stalled row groups to avoid wasting semaphore slots Run inline salvage after every checkpoint pass instead of only when globally stalled. Row groups with 0 in-flight and only deferred tasks are salvaged immediately, freeing their semaphore slot for new work. * fix: address review findings from greptile and codex - Use `with` statement for progress bar context (safe __exit__ on error) - Check bar.is_active instead of bar is not None (non-TTY fallback) - Record failures (not skips) for tasks that exhaust salvage retries - Record skipped tasks when pre-batch filtering drops rows * fix: stable progress bar width and accurate failure counts - Pre-compute fixed stats width at bar creation to prevent bar resizing when failed count appears - Cap displayed completed at total to avoid >100% on retries - Exclude already-failed columns from skip recording to prevent double-counting in progress reporter * fix: address Nabin's review - exclude_columns, dead code, docstring - Add exclude_columns={task.column} on non-retryable batch/from_scratch drop path to prevent double-counting (same pattern as cell path) - Simplify salvage drop to per-task exclude (is_dropped guard handles multi-column case) - Remove dead _in_flight_for_rg method - Fix context.py docstring to match actual (x/X) format * fix: _drain_frontier exits before dispatching ready salvage tasks After salvage discards a cell task from dispatched (making it available in the frontier), _drain_frontier broke immediately because nothing was in-flight yet. The task and its downstream were never re-dispatched, leaving the row group incomplete. Fix: only break when both ready and in-flight are empty. * fix: salvage edge cases found by code review - _drain_frontier: only break when both ready and in-flight are empty - _salvage_rounds: re-mark sibling columns as dispatched after from_scratch retry to prevent duplicate dispatch - _salvage_stalled_row_groups: separate exhausted tasks from new drain failures to avoid treating non-stalled tasks as permanent - _checkpoint_completed_row_groups: clean up deferred tasks for checkpointed row groups - Early shutdown: salvage stalled row groups before exiting * fix: skip record_failure for already-dropped rows in salvage When multiple columns fail for the same row, the first drop records a skip for the other column. Without this guard, record_failure fires again for the second column, double-counting it.

#454) * docs: restructure agent and contributor documentation Restructure AGENTS.md from ~627 lines to ~55 lines of high-signal architectural invariants. Extract code style into STYLEGUIDE.md and development workflow into DEVELOPMENT.md. Overhaul CONTRIBUTING.md to reflect agent-assisted development as the primary workflow. Move skills and sub-agents from .claude/ to .agents/ as the tool-agnostic home, with symlinks back for Claude Code compatibility. Add architecture/ skeleton with 10 stub files for incremental population. Implements PR 1 of #427. Made-with: Cursor * remove obsolete new-sdg skill The new-sdg skill is superseded by skills/data-designer/, which is the proper usage skill for building datasets. Update .agents/README.md to reference the usage skill's actual location. Made-with: Cursor * docs: expand style guide and refine development docs Add docstring conventions (Google style), Pydantic/dataclass guidance, error handling patterns, and f-string preference to STYLEGUIDE.md. Clarify per-package test targets, flat test style, e2e API key requirement, notebook regeneration commands, and import perf threshold in DEVELOPMENT.md. Point dataset-building agents to the data-designer skill in AGENTS.md and clarify dependency direction arrows. Made-with: Cursor * docs: link AGENTS.md to architecture/ directory Made-with: Cursor * docs: refine CONTRIBUTING.md contribution workflow Add plan document step, self-review with multi-model passes, automated CI review expectations, and comment resolution protocol. Made-with: Cursor * docs: add architecture/ to PR 2 scope and link from AGENTS.md Move architecture doc population from deferred/incremental to PR 2 since the subsystems already exist. Update plan delivery strategy, execution order, and out-of-scope sections accordingly. Made-with: Cursor * docs: address PR review comments on style guide, dev guide, and contributing Replace pd.DataFrame with list[dict[str, str]] in naming example to avoid contradicting lazy-import guidance in the same file. Soften "enforced by SIM" to note SIM rules are not yet enabled in CI. Fix upstream sync instructions for fork-based contributors. Update copyright year in CONTRIBUTING.md from 2025 to 2026 to match STYLEGUIDE.md.

#475) * fix: address nspect vulnerability report for requests and cryptography Bump requests lower bound to >=2.33 to exclude vulnerable 2.32.x and update lockfile to pull cryptography 46.0.6 and requests 2.33.0. * fix: bump pygments lower bound to >=2.20 to address CVE-2026-4539 ReDoS vulnerability in the Archetype lexer fixed in Pygments 2.20.0.

The ModelFacade constructor was refactored in #373 to accept a ModelClient instead of a SecretResolver. The health_checks.py script was not updated, causing a TypeError (exit code 2) on every run since March 9.

* ci: upgrade GitHub Actions for Node.js 24 compatibility Upgrades actions to versions compatible with the Node.js 24 runtime: - actions/checkout: → v6 - actions/upload-artifact: → v6 - actions/download-artifact: → v7 - actions/github-script: → v8 - actions/setup-python: → v6 Mirrors: NVIDIA/Megatron-LM@1d5e68b Signed-off-by: oliver könig <okoenig@nvidia.com> * ci: also upgrade actions/cache and astral-sh/setup-uv to node24-compatible versions - actions/cache: v4 → v5 in build-notebooks.yml - astral-sh/setup-uv: v5/v6 → v7 in ci.yml, check-colab-notebooks.yml, health-checks.yml, build-docs.yml, build-notebooks.yml Addresses: #450 (comment) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: oliver könig <okoenig@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andre Manoel <165937436+andreatgretel@users.noreply.github.com>

…#423) * Reduce Greptile review noise from defensive coding suggestions Set strictness to 3 (critical only), filter to logic-only comments, and add instructions to skip defensive coding patterns like error handling, null checks, and input validation unless there's an actual bug. * Collapse sequence diagram section by default --------- Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

… rollout sources (#481) Add comprehensive documentation for DirectorySeedSource, FileContentsSeedSource, and AgentRolloutSeedSource to the seed datasets concept page. Add FileSystemSeedReader plugin authoring guide and Markdown section seed reader recipe. Supersedes #425 and #452. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ty (#482)

* feat: add fr_FR locale to nemotron personas datasets Register the France locale (fr_FR, 2.71 GB) in NEMOTRON_PERSONAS_DATASET_SIZES and add 7 France-specific PII fields: first_name_heritage, name_heritage, is_first_gen_immigrant, household_type, monthly_income_eur, commune, departement. * fix: update download controller and service tests for fr_FR locale Update hardcoded locale counts from 7 to 8 and add fr_FR assertions in download controller and download service tests. * fix: generate CLI locale help text dynamically from constants The --locale help text was hardcoded and already stale (missing en_SG, pt_BR, fr_FR). Build it from LOCALES_WITH_MANAGED_DATASETS so it stays in sync automatically. * refactor: add LOCALES_WITH_MANAGED_DATASETS_STR constant Centralise the comma-joined locale list so it is defined once in constants and reused in the CLI help text, PersonSamplerParams field description, and locale validation error message.

* add images * re-ran slopguard * update dev notes * address greptile comments * update example model name * add info on throttlemanager * address pr feedback * Add link to model aliases * address pr feedback * update key resources * update key resources * crop image for better fit * Fix max_parallel_requests * refine concluding paragraph

* fix: respect max_parallel_requests in HTTP connection pool size Pass a pre-configured HTTPTransport/AsyncHTTPTransport with the correct limits into RetryTransport instead of letting it create its own pool with httpx defaults (100 connections). Previously, the limits calculated from max_parallel_requests were passed to httpx.Client(limits=...) which silently ignores them when a custom transport is provided. Fixes #459 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * fix: document transport param and annotate private attr chains in tests Address Greptile P2 review comments on PR #460: - Add docstring entry for the new `transport` parameter in `create_retry_transport` explaining accepted types and None default - Add inline comments in pool-size regression tests explaining the private attribute chain (_sync/_async_transport → _pool → _max_connections) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> * refactor: address review feedback — public limits property, clean tests - Drop three tests from test_retry.py that reached into private attributes of third-party RetryTransport (_sync_transport, _async_transport); the end-to-end contract is covered by the pool-size regression test - Expose a public `limits` property on HttpModelClient so tests and diagnostic code can assert the pool configuration without walking private attribute chains across three libraries - Replace two private-chain pool assertions with a single `client.limits.max_connections == 600` check against the new property - Trim "inner" from the transport docstring entry (nabinchha suggestion) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Przemysław <przemekboruta@interia.pl> --------- Signed-off-by: Przemysław <przemekboruta@interia.pl> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Nabin Mulepati <nmulepati@nvidia.com>

Wrap non-hero images in text-align:center divs so they display centered on wide monitors instead of left-aligned. Made-with: Cursor

…467) * docs: update architecture-and-performance.md to reflect AIMD concurrency control (#466) Account for the AIMD throttle manager throughout the architecture doc: concurrency formula, max_parallel_requests guidance, new ThrottleConfig section, common problems table, and tuning workflow. Add sync-engine caveats noting AIMD is fully active on the async path. Made-with: Cursor * updates * docs: address PR #467 review feedback on AIMD doc - Clarify that only the first 429 in a burst reduces the concurrency limit; subsequent in-flight cascade 429s hold it steady - Soften sync engine caveats: AIMD engages as a fallback when transport retries exhaust, not "no effect" - Note salvage queue difference between async and sync paths when recommending aggressive max_parallel_requests values Made-with: Cursor

* chore: update review-code skill output and tone - Remove Overview metadata table from review output - Add cordial tone guidelines and thank-you opening - Tie verdict strictly to highest-severity finding - Soften severity section headers - Add Step 7 to post review as GitHub PR comment - Omit empty severity sections from output Closes #476 Made-with: Cursor * address greptile comments * add signature portion

…473) * docs: add agentic CI plan for automated PR reviews and daily maintenance Closes #472 * docs: add API configuration and auth modes to agentic CI plan * docs: add PoC lessons and operational details to agentic CI plan * docs: add runner label targeting to agentic CI plan * docs: add re-review label and workflow_dispatch triggers to PR review * docs: rename runner label to agentic-ci * docs: add check run as gate for PR review, output stays as comment * ci: add agentic CI health probe workflow and recipe scaffold - Health probe: pings inference API, checks latency, verifies Claude CLI - Runs every 6h on self-hosted agentic-ci runner, plus manual dispatch - Dual auth mode: custom endpoint (secret) or OAuth fallback - Recipe scaffold: _runner.md shared context, health-probe recipe - Update .agents/README.md to include recipes directory * docs: address Greptile review feedback on agentic CI plan - Add checks: write to recipe frontmatter example - Add concurrency group to daily maintenance workflow spec - Clarify fork PRs are out of scope (pull_request event only) - Document workflow_dispatch callers as trusted (accepted risk) * fix: skip API curl in OAuth mode, add branch protection note - Health probe: skip the direct API ping step in OAuth mode (no API key available for curl; Claude CLI step is the sole health signal) - Guard latency threshold check on custom auth mode - Plan: note that contents:write on daily suites requires branch protection rules to prevent agent self-merging * fix: address Nabin's second review feedback - Health probe: fix latency threshold string comparison with fromJSON() - Health probe: add permissions: contents: read - Health probe: fail fast if AGENTIC_CI_MODEL variable is not set - Runner context: add prompt-injection defense and output sanitization - Plan: update Phase 2 deliverable to match cache-based memory approach - Plan: reference STYLEGUIDE.md in code-quality suite - README: note that recipes don't need a .claude/ symlink * docs: sync plan with implementation decisions - Health probe uses workflow failure, not issue open/close - Pre-flight checks should fail fast on missing config - Add GHA string comparison gotcha to PoC lessons - Add explicit permissions block recommendation to PoC lessons - Bump max_turns from 20 to 30 in recipe example * docs: address PR review feedback on agentic CI plan - Review docs PRs with lighter recipe instead of skipping by file type - Switch runner memory from committed branch to GH Actions cache - Add import perf check to test-health suite - Add nuance on dependency pinning strictness vs DX - Add Follow-up: Weekend Agents section (perf, AI-QA, repo triage) - Add cost guardrails open question - Add status field to frontmatter

* test: add transport-wiring regression tests for #459 The existing test_client_limits_respect_max_parallel_requests only checks the limits property, which was always computed correctly even before the fix. Add mock-capture tests that patch HTTPTransport and AsyncHTTPTransport constructors, trigger lazy init via completion() / acompletion(), and assert the constructors received the correct limits. These tests fail on the pre-fix code (assert_called_once fails because the old code never explicitly constructed the transport). Made-with: Cursor * test: address review — parametrize over AnthropicClient, use helpers Parametrize all three #459 regression tests over both OpenAICompatibleClient and AnthropicClient (wiring lives in HttpModelClient, so both subclasses need coverage). Use the existing _make_openai_client / _make_anthropic_client helpers with **kwargs instead of constructing clients directly. Move transport patch constants up with the other module-level constants. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Mar 25, 2026

Commits on Mar 30, 2026

Commits on Mar 31, 2026

Commits on Apr 1, 2026

This comparison is taking too long to generate.

Uh oh!