Skip to content

docs: remove internal GitLab URL from launching-evals skill#920

Merged
piojanu merged 1 commit intomainfrom
pjanuszewski/fix-launching-evals-gitlab-url
Apr 20, 2026
Merged

docs: remove internal GitLab URL from launching-evals skill#920
piojanu merged 1 commit intomainfrom
pjanuszewski/fix-launching-evals-gitlab-url

Conversation

@piojanu
Copy link
Copy Markdown
Contributor

@piojanu piojanu commented Apr 20, 2026

Summary

Replaces the internal gitlab-master.nvidia.com example container in the launching-evals skill with a public nvcr.io container already used in the repo docs.

Testing

  • Not run; docs-only change

Resolves EVAL-1311
Linear issue: https://linear.app/nvidia/issue/EVAL-1311/skills-remove-internal-gitlab-url-from-launching-evals-skill

Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
@piojanu piojanu requested review from a team as code owners April 20, 2026 12:00
@piojanu piojanu added the docs-only With great power comes great responsibility. label Apr 20, 2026
@piojanu piojanu enabled auto-merge (squash) April 20, 2026 12:01
@piojanu piojanu merged commit 6ae85ff into main Apr 20, 2026
48 checks passed
@piojanu piojanu deleted the pjanuszewski/fix-launching-evals-gitlab-url branch April 20, 2026 12:18
Edwardf0t1 added a commit to NVIDIA/Model-Optimizer that referenced this pull request Apr 27, 2026
…flow testing, vendor two Claude skills from NeMo Evaluator (#1239)

### What does this PR do?

Type of change: Documentation / Skills

Polishes evaluation and common skills based on end-to-end experience
quantizing + deploying + evaluating LLMs. Vendors the two upstream
Claude skills from NeMo Evaluator, splits shared credential setup into
its own doc, and applies reviewer feedback.

**Status:** ✅ Approved by @mxinO; all CI passing.

### Changes

**New files**

- `.claude/skills/launching-evals/` — vendored verbatim from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @
commit `8fa16b2` (latest). Covers run / check / debug / analyze flows
for NEL evaluations.
- `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from
the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server.
- `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills
above at a pinned SHA. Idempotent; re-applies our provenance frontmatter
on each run.
- `.claude/skills/common/credentials.md` — shared HF / NGC / Docker
credential setup, referenced from slurm-setup.md. Generic (not
NVIDIA-internal) — public NEL SLURM-executor users rely on the same
NGC/HF setup. Includes a "check what's already set first" detection
section so the agent skips already-configured credentials.

**Updated files**

- `.claude/skills/common/slurm-setup.md` — NGC credential block
collapsed to a one-paragraph pointer at `credentials.md`.
- `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and
storage availability" as "Staging checkpoints from your workstation".
Drops the misleading login-vs-compute framing and the dlcluster-specific
row.
- `.claude/skills/common/workspace-management.md` — drops stale pointer
to the deleted e2e doc.
- `.claude/skills/evaluation/SKILL.md` — workspace-integration intro
trimmed; NEL CI section removed (content moved to
Model-Optimizer-Internal). Monitoring block replaced via #1252 merge
with joint trim text that routes to `monitor`, `launching-evals`, and
`accessing-mlflow`.
- `.claude/skills/ptq/SKILL.md` — "Next steps" block refined.
- `.markdownlint-cli2.yaml` — excludes vendored upstream skills from
markdownlint so they stay byte-identical to upstream.

**Deleted files**

- `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and
@mxinO: redundant with skill descriptions that already handle
cross-skill routing.
- `.claude/skills/evaluation/references/nel-ci-guide.md` — per
@shengliangxu: NVIDIA-internal. Moved to
`Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57).

### Review status

| Reviewer | Concern | How addressed |
|---|---|---|
| @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content |
Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md`
since it covers both NEL SLURM executor and NEL CI). Refreshed against
current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil
distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). |
| @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill
descriptions already route between ptq / deployment / evaluation. |
| @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section
| Coordinated via comment on #1252. @kaix-nv incorporated the joint trim
text referencing `monitor` + `launching-evals` + `accessing-mlflow`.
Landed via #1252 merge. |
| @mxinO | `remote-execution.md` compute-node framing is misleading |
Reframed as workstation→cluster staging. |
| @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't
SLURM-specific; `$oauthtoken` literal clarification; overwrite safety |
New `credentials.md` with NGC / HF / Docker setup. Append-if-missing
pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken`
as literal, kept unexpanded via single quotes. |
| @mxinO | `credentials.md`: check what's already set first; document
`hf auth login` | Added detection section at the top covering
`HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds.
HF section now documents `hf auth login` as the recommended interactive
path; env-var path kept as option 2 for scripts/CI. |
| @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in
`launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from
`01899f8` to `8fa16b2` — upstream PR
[NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920)
already replaced it with the public
`nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked
up upstream's new `nemo-evaluator-launcher resume` command and a tighter
"MANDATORY monitor after every `nel run`" directive. |
| @kevalmorabia97 | Internal `PPP` terminology +
`/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring
verbatim means we can't sanitize locally without breaking the "verbatim"
property. Filed upstream as
[NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938)
to genericize. Next sync-script SHA bump will pick up the fix
automatically. |

### Related

- **Depends on:** #1236 (`deployment/references/unsupported-models.md`,
merged)
- **Coordinated with:** #1252 (monitor skill, merged — joint trim text
incorporated)
- **Internal counterpart:** [Model-Optimizer-Internal MR
!57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57)
— `agent/evaluation_guide.md`
- **Upstream coordination:** vendored skills synced from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @
`8fa16b2`. Follow-up issue:
[NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938).

### Motivation

Learnings from running end-to-end PTQ → Deploy → Eval on
Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus
prior NEL CI experience on oci-hsg.

### Testing

Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations)
→ NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on
`limit_samples=10`).

### Before your PR is "*Ready for review*"

- Is this change backward compatible?: ✅ (documentation / skills only)
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- `.claude/skills/launching-evals/` and
`.claude/skills/accessing-mlflow/` are vendored verbatim from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)
(Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and
in `.claude/scripts/sync-upstream-skills.sh`.
- Did you write any new necessary tests?: N/A (skill documentation)
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅

---------

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
grzegorz-k-karch pushed a commit to NVIDIA/Model-Optimizer that referenced this pull request Apr 28, 2026
…flow testing, vendor two Claude skills from NeMo Evaluator (#1239)

### What does this PR do?

Type of change: Documentation / Skills

Polishes evaluation and common skills based on end-to-end experience
quantizing + deploying + evaluating LLMs. Vendors the two upstream
Claude skills from NeMo Evaluator, splits shared credential setup into
its own doc, and applies reviewer feedback.

**Status:** ✅ Approved by @mxinO; all CI passing.

### Changes

**New files**

- `.claude/skills/launching-evals/` — vendored verbatim from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @
commit `8fa16b2` (latest). Covers run / check / debug / analyze flows
for NEL evaluations.
- `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from
the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server.
- `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills
above at a pinned SHA. Idempotent; re-applies our provenance frontmatter
on each run.
- `.claude/skills/common/credentials.md` — shared HF / NGC / Docker
credential setup, referenced from slurm-setup.md. Generic (not
NVIDIA-internal) — public NEL SLURM-executor users rely on the same
NGC/HF setup. Includes a "check what's already set first" detection
section so the agent skips already-configured credentials.

**Updated files**

- `.claude/skills/common/slurm-setup.md` — NGC credential block
collapsed to a one-paragraph pointer at `credentials.md`.
- `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and
storage availability" as "Staging checkpoints from your workstation".
Drops the misleading login-vs-compute framing and the dlcluster-specific
row.
- `.claude/skills/common/workspace-management.md` — drops stale pointer
to the deleted e2e doc.
- `.claude/skills/evaluation/SKILL.md` — workspace-integration intro
trimmed; NEL CI section removed (content moved to
Model-Optimizer-Internal). Monitoring block replaced via #1252 merge
with joint trim text that routes to `monitor`, `launching-evals`, and
`accessing-mlflow`.
- `.claude/skills/ptq/SKILL.md` — "Next steps" block refined.
- `.markdownlint-cli2.yaml` — excludes vendored upstream skills from
markdownlint so they stay byte-identical to upstream.

**Deleted files**

- `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and
@mxinO: redundant with skill descriptions that already handle
cross-skill routing.
- `.claude/skills/evaluation/references/nel-ci-guide.md` — per
@shengliangxu: NVIDIA-internal. Moved to
`Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57).

### Review status

| Reviewer | Concern | How addressed |
|---|---|---|
| @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content |
Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md`
since it covers both NEL SLURM executor and NEL CI). Refreshed against
current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil
distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). |
| @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill
descriptions already route between ptq / deployment / evaluation. |
| @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section
| Coordinated via comment on #1252. @kaix-nv incorporated the joint trim
text referencing `monitor` + `launching-evals` + `accessing-mlflow`.
Landed via #1252 merge. |
| @mxinO | `remote-execution.md` compute-node framing is misleading |
Reframed as workstation→cluster staging. |
| @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't
SLURM-specific; `$oauthtoken` literal clarification; overwrite safety |
New `credentials.md` with NGC / HF / Docker setup. Append-if-missing
pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken`
as literal, kept unexpanded via single quotes. |
| @mxinO | `credentials.md`: check what's already set first; document
`hf auth login` | Added detection section at the top covering
`HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds.
HF section now documents `hf auth login` as the recommended interactive
path; env-var path kept as option 2 for scripts/CI. |
| @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in
`launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from
`01899f8` to `8fa16b2` — upstream PR
[NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920)
already replaced it with the public
`nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked
up upstream's new `nemo-evaluator-launcher resume` command and a tighter
"MANDATORY monitor after every `nel run`" directive. |
| @kevalmorabia97 | Internal `PPP` terminology +
`/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring
verbatim means we can't sanitize locally without breaking the "verbatim"
property. Filed upstream as
[NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938)
to genericize. Next sync-script SHA bump will pick up the fix
automatically. |

### Related

- **Depends on:** #1236 (`deployment/references/unsupported-models.md`,
merged)
- **Coordinated with:** #1252 (monitor skill, merged — joint trim text
incorporated)
- **Internal counterpart:** [Model-Optimizer-Internal MR
!57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57)
— `agent/evaluation_guide.md`
- **Upstream coordination:** vendored skills synced from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @
`8fa16b2`. Follow-up issue:
[NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938).

### Motivation

Learnings from running end-to-end PTQ → Deploy → Eval on
Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus
prior NEL CI experience on oci-hsg.

### Testing

Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations)
→ NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on
`limit_samples=10`).

### Before your PR is "*Ready for review*"

- Is this change backward compatible?: ✅ (documentation / skills only)
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅
- `.claude/skills/launching-evals/` and
`.claude/skills/accessing-mlflow/` are vendored verbatim from
[NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)
(Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and
in `.claude/scripts/sync-upstream-skills.sh`.
- Did you write any new necessary tests?: N/A (skill documentation)
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅

---------

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants