docs: remove internal GitLab URL from launching-evals skill by piojanu · Pull Request #920 · NVIDIA-NeMo/Evaluator

piojanu · 2026-04-20T12:00:36Z

Summary

Replaces the internal gitlab-master.nvidia.com example container in the launching-evals skill with a public nvcr.io container already used in the repo docs.

Testing

Not run; docs-only change

Resolves EVAL-1311
Linear issue: https://linear.app/nvidia/issue/EVAL-1311/skills-remove-internal-gitlab-url-from-launching-evals-skill

Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>

@mxinO

…flow testing, vendor two Claude skills from NeMo Evaluator (#1239) ### What does this PR do? Type of change: Documentation / Skills Polishes evaluation and common skills based on end-to-end experience quantizing + deploying + evaluating LLMs. Vendors the two upstream Claude skills from NeMo Evaluator, splits shared credential setup into its own doc, and applies reviewer feedback. **Status:** ✅ Approved by @mxinO; all CI passing. ### Changes **New files** - `.claude/skills/launching-evals/` — vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ commit `8fa16b2` (latest). Covers run / check / debug / analyze flows for NEL evaluations. - `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server. - `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills above at a pinned SHA. Idempotent; re-applies our provenance frontmatter on each run. - `.claude/skills/common/credentials.md` — shared HF / NGC / Docker credential setup, referenced from slurm-setup.md. Generic (not NVIDIA-internal) — public NEL SLURM-executor users rely on the same NGC/HF setup. Includes a "check what's already set first" detection section so the agent skips already-configured credentials. **Updated files** - `.claude/skills/common/slurm-setup.md` — NGC credential block collapsed to a one-paragraph pointer at `credentials.md`. - `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and storage availability" as "Staging checkpoints from your workstation". Drops the misleading login-vs-compute framing and the dlcluster-specific row. - `.claude/skills/common/workspace-management.md` — drops stale pointer to the deleted e2e doc. - `.claude/skills/evaluation/SKILL.md` — workspace-integration intro trimmed; NEL CI section removed (content moved to Model-Optimizer-Internal). Monitoring block replaced via #1252 merge with joint trim text that routes to `monitor`, `launching-evals`, and `accessing-mlflow`. - `.claude/skills/ptq/SKILL.md` — "Next steps" block refined. - `.markdownlint-cli2.yaml` — excludes vendored upstream skills from markdownlint so they stay byte-identical to upstream. **Deleted files** - `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and @mxinO: redundant with skill descriptions that already handle cross-skill routing. - `.claude/skills/evaluation/references/nel-ci-guide.md` — per @shengliangxu: NVIDIA-internal. Moved to `Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57). ### Review status | Reviewer | Concern | How addressed | |---|---|---| | @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content | Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md` since it covers both NEL SLURM executor and NEL CI). Refreshed against current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). | | @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill descriptions already route between ptq / deployment / evaluation. | | @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section | Coordinated via comment on #1252. @kaix-nv incorporated the joint trim text referencing `monitor` + `launching-evals` + `accessing-mlflow`. Landed via #1252 merge. | | @mxinO | `remote-execution.md` compute-node framing is misleading | Reframed as workstation→cluster staging. | | @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't SLURM-specific; `$oauthtoken` literal clarification; overwrite safety | New `credentials.md` with NGC / HF / Docker setup. Append-if-missing pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken` as literal, kept unexpanded via single quotes. | | @mxinO | `credentials.md`: check what's already set first; document `hf auth login` | Added detection section at the top covering `HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds. HF section now documents `hf auth login` as the recommended interactive path; env-var path kept as option 2 for scripts/CI. | | @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in `launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from `01899f8` to `8fa16b2` — upstream PR [NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920) already replaced it with the public `nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked up upstream's new `nemo-evaluator-launcher resume` command and a tighter "MANDATORY monitor after every `nel run`" directive. | | @kevalmorabia97 | Internal `PPP` terminology + `/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring verbatim means we can't sanitize locally without breaking the "verbatim" property. Filed upstream as [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938) to genericize. Next sync-script SHA bump will pick up the fix automatically. | ### Related - **Depends on:** #1236 (`deployment/references/unsupported-models.md`, merged) - **Coordinated with:** #1252 (monitor skill, merged — joint trim text incorporated) - **Internal counterpart:** [Model-Optimizer-Internal MR !57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57) — `agent/evaluation_guide.md` - **Upstream coordination:** vendored skills synced from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ `8fa16b2`. Follow-up issue: [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938). ### Motivation Learnings from running end-to-end PTQ → Deploy → Eval on Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus prior NEL CI experience on oci-hsg. ### Testing Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations) → NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on `limit_samples=10`). ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ (documentation / skills only) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - `.claude/skills/launching-evals/` and `.claude/skills/accessing-mlflow/` are vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) (Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and in `.claude/scripts/sync-upstream-skills.sh`. - Did you write any new necessary tests?: N/A (skill documentation) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ --------- Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

@mxinO

…flow testing, vendor two Claude skills from NeMo Evaluator (#1239) ### What does this PR do? Type of change: Documentation / Skills Polishes evaluation and common skills based on end-to-end experience quantizing + deploying + evaluating LLMs. Vendors the two upstream Claude skills from NeMo Evaluator, splits shared credential setup into its own doc, and applies reviewer feedback. **Status:** ✅ Approved by @mxinO; all CI passing. ### Changes **New files** - `.claude/skills/launching-evals/` — vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ commit `8fa16b2` (latest). Covers run / check / debug / analyze flows for NEL evaluations. - `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server. - `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills above at a pinned SHA. Idempotent; re-applies our provenance frontmatter on each run. - `.claude/skills/common/credentials.md` — shared HF / NGC / Docker credential setup, referenced from slurm-setup.md. Generic (not NVIDIA-internal) — public NEL SLURM-executor users rely on the same NGC/HF setup. Includes a "check what's already set first" detection section so the agent skips already-configured credentials. **Updated files** - `.claude/skills/common/slurm-setup.md` — NGC credential block collapsed to a one-paragraph pointer at `credentials.md`. - `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and storage availability" as "Staging checkpoints from your workstation". Drops the misleading login-vs-compute framing and the dlcluster-specific row. - `.claude/skills/common/workspace-management.md` — drops stale pointer to the deleted e2e doc. - `.claude/skills/evaluation/SKILL.md` — workspace-integration intro trimmed; NEL CI section removed (content moved to Model-Optimizer-Internal). Monitoring block replaced via #1252 merge with joint trim text that routes to `monitor`, `launching-evals`, and `accessing-mlflow`. - `.claude/skills/ptq/SKILL.md` — "Next steps" block refined. - `.markdownlint-cli2.yaml` — excludes vendored upstream skills from markdownlint so they stay byte-identical to upstream. **Deleted files** - `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and @mxinO: redundant with skill descriptions that already handle cross-skill routing. - `.claude/skills/evaluation/references/nel-ci-guide.md` — per @shengliangxu: NVIDIA-internal. Moved to `Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57). ### Review status | Reviewer | Concern | How addressed | |---|---|---| | @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content | Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md` since it covers both NEL SLURM executor and NEL CI). Refreshed against current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). | | @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill descriptions already route between ptq / deployment / evaluation. | | @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section | Coordinated via comment on #1252. @kaix-nv incorporated the joint trim text referencing `monitor` + `launching-evals` + `accessing-mlflow`. Landed via #1252 merge. | | @mxinO | `remote-execution.md` compute-node framing is misleading | Reframed as workstation→cluster staging. | | @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't SLURM-specific; `$oauthtoken` literal clarification; overwrite safety | New `credentials.md` with NGC / HF / Docker setup. Append-if-missing pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken` as literal, kept unexpanded via single quotes. | | @mxinO | `credentials.md`: check what's already set first; document `hf auth login` | Added detection section at the top covering `HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds. HF section now documents `hf auth login` as the recommended interactive path; env-var path kept as option 2 for scripts/CI. | | @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in `launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from `01899f8` to `8fa16b2` — upstream PR [NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920) already replaced it with the public `nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked up upstream's new `nemo-evaluator-launcher resume` command and a tighter "MANDATORY monitor after every `nel run`" directive. | | @kevalmorabia97 | Internal `PPP` terminology + `/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring verbatim means we can't sanitize locally without breaking the "verbatim" property. Filed upstream as [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938) to genericize. Next sync-script SHA bump will pick up the fix automatically. | ### Related - **Depends on:** #1236 (`deployment/references/unsupported-models.md`, merged) - **Coordinated with:** #1252 (monitor skill, merged — joint trim text incorporated) - **Internal counterpart:** [Model-Optimizer-Internal MR !57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57) — `agent/evaluation_guide.md` - **Upstream coordination:** vendored skills synced from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ `8fa16b2`. Follow-up issue: [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938). ### Motivation Learnings from running end-to-end PTQ → Deploy → Eval on Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus prior NEL CI experience on oci-hsg. ### Testing Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations) → NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on `limit_samples=10`). ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ (documentation / skills only) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - `.claude/skills/launching-evals/` and `.claude/skills/accessing-mlflow/` are vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) (Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and in `.claude/scripts/sync-upstream-skills.sh`. - Did you write any new necessary tests?: N/A (skill documentation) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ --------- Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

docs: replace internal container example in launching-evals skill

a8dfccf

Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>

piojanu requested review from a team as code owners April 20, 2026 12:00

piojanu added the docs-only With great power comes great responsibility. label Apr 20, 2026

piojanu enabled auto-merge (squash) April 20, 2026 12:01

copy-pr-bot Bot temporarily deployed to test April 20, 2026 12:01 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 20, 2026 12:02 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 20, 2026 12:04 Inactive

marta-sd approved these changes Apr 20, 2026

View reviewed changes

piojanu merged commit 6ae85ff into main Apr 20, 2026
48 checks passed

piojanu deleted the pjanuszewski/fix-launching-evals-gitlab-url branch April 20, 2026 12:18

marta-sd mentioned this pull request Apr 20, 2026

Skills: remove internal GitLab URL from launching-evals skill #917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: remove internal GitLab URL from launching-evals skill#920

docs: remove internal GitLab URL from launching-evals skill#920
piojanu merged 1 commit intomainfrom
pjanuszewski/fix-launching-evals-gitlab-url

piojanu commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

piojanu commented Apr 20, 2026

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants