docs: remove internal GitLab URL from launching-evals skill#920
Merged
Conversation
Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
marta-sd
approved these changes
Apr 20, 2026
Edwardf0t1
added a commit
to NVIDIA/Model-Optimizer
that referenced
this pull request
Apr 27, 2026
…flow testing, vendor two Claude skills from NeMo Evaluator (#1239) ### What does this PR do? Type of change: Documentation / Skills Polishes evaluation and common skills based on end-to-end experience quantizing + deploying + evaluating LLMs. Vendors the two upstream Claude skills from NeMo Evaluator, splits shared credential setup into its own doc, and applies reviewer feedback. **Status:** ✅ Approved by @mxinO; all CI passing. ### Changes **New files** - `.claude/skills/launching-evals/` — vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ commit `8fa16b2` (latest). Covers run / check / debug / analyze flows for NEL evaluations. - `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server. - `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills above at a pinned SHA. Idempotent; re-applies our provenance frontmatter on each run. - `.claude/skills/common/credentials.md` — shared HF / NGC / Docker credential setup, referenced from slurm-setup.md. Generic (not NVIDIA-internal) — public NEL SLURM-executor users rely on the same NGC/HF setup. Includes a "check what's already set first" detection section so the agent skips already-configured credentials. **Updated files** - `.claude/skills/common/slurm-setup.md` — NGC credential block collapsed to a one-paragraph pointer at `credentials.md`. - `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and storage availability" as "Staging checkpoints from your workstation". Drops the misleading login-vs-compute framing and the dlcluster-specific row. - `.claude/skills/common/workspace-management.md` — drops stale pointer to the deleted e2e doc. - `.claude/skills/evaluation/SKILL.md` — workspace-integration intro trimmed; NEL CI section removed (content moved to Model-Optimizer-Internal). Monitoring block replaced via #1252 merge with joint trim text that routes to `monitor`, `launching-evals`, and `accessing-mlflow`. - `.claude/skills/ptq/SKILL.md` — "Next steps" block refined. - `.markdownlint-cli2.yaml` — excludes vendored upstream skills from markdownlint so they stay byte-identical to upstream. **Deleted files** - `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and @mxinO: redundant with skill descriptions that already handle cross-skill routing. - `.claude/skills/evaluation/references/nel-ci-guide.md` — per @shengliangxu: NVIDIA-internal. Moved to `Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57). ### Review status | Reviewer | Concern | How addressed | |---|---|---| | @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content | Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md` since it covers both NEL SLURM executor and NEL CI). Refreshed against current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). | | @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill descriptions already route between ptq / deployment / evaluation. | | @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section | Coordinated via comment on #1252. @kaix-nv incorporated the joint trim text referencing `monitor` + `launching-evals` + `accessing-mlflow`. Landed via #1252 merge. | | @mxinO | `remote-execution.md` compute-node framing is misleading | Reframed as workstation→cluster staging. | | @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't SLURM-specific; `$oauthtoken` literal clarification; overwrite safety | New `credentials.md` with NGC / HF / Docker setup. Append-if-missing pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken` as literal, kept unexpanded via single quotes. | | @mxinO | `credentials.md`: check what's already set first; document `hf auth login` | Added detection section at the top covering `HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds. HF section now documents `hf auth login` as the recommended interactive path; env-var path kept as option 2 for scripts/CI. | | @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in `launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from `01899f8` to `8fa16b2` — upstream PR [NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920) already replaced it with the public `nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked up upstream's new `nemo-evaluator-launcher resume` command and a tighter "MANDATORY monitor after every `nel run`" directive. | | @kevalmorabia97 | Internal `PPP` terminology + `/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring verbatim means we can't sanitize locally without breaking the "verbatim" property. Filed upstream as [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938) to genericize. Next sync-script SHA bump will pick up the fix automatically. | ### Related - **Depends on:** #1236 (`deployment/references/unsupported-models.md`, merged) - **Coordinated with:** #1252 (monitor skill, merged — joint trim text incorporated) - **Internal counterpart:** [Model-Optimizer-Internal MR !57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57) — `agent/evaluation_guide.md` - **Upstream coordination:** vendored skills synced from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ `8fa16b2`. Follow-up issue: [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938). ### Motivation Learnings from running end-to-end PTQ → Deploy → Eval on Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus prior NEL CI experience on oci-hsg. ### Testing Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations) → NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on `limit_samples=10`). ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ (documentation / skills only) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - `.claude/skills/launching-evals/` and `.claude/skills/accessing-mlflow/` are vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) (Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and in `.claude/scripts/sync-upstream-skills.sh`. - Did you write any new necessary tests?: N/A (skill documentation) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ --------- Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
grzegorz-k-karch
pushed a commit
to NVIDIA/Model-Optimizer
that referenced
this pull request
Apr 28, 2026
…flow testing, vendor two Claude skills from NeMo Evaluator (#1239) ### What does this PR do? Type of change: Documentation / Skills Polishes evaluation and common skills based on end-to-end experience quantizing + deploying + evaluating LLMs. Vendors the two upstream Claude skills from NeMo Evaluator, splits shared credential setup into its own doc, and applies reviewer feedback. **Status:** ✅ Approved by @mxinO; all CI passing. ### Changes **New files** - `.claude/skills/launching-evals/` — vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ commit `8fa16b2` (latest). Covers run / check / debug / analyze flows for NEL evaluations. - `.claude/skills/accessing-mlflow/SKILL.md` — vendored verbatim from the same upstream. Queries MLflow runs via the `mlflow-mcp` MCP server. - `.claude/scripts/sync-upstream-skills.sh` — re-vendors the two skills above at a pinned SHA. Idempotent; re-applies our provenance frontmatter on each run. - `.claude/skills/common/credentials.md` — shared HF / NGC / Docker credential setup, referenced from slurm-setup.md. Generic (not NVIDIA-internal) — public NEL SLURM-executor users rely on the same NGC/HF setup. Includes a "check what's already set first" detection section so the agent skips already-configured credentials. **Updated files** - `.claude/skills/common/slurm-setup.md` — NGC credential block collapsed to a one-paragraph pointer at `credentials.md`. - `.claude/skills/common/remote-execution.md` — reframed "Checkpoint and storage availability" as "Staging checkpoints from your workstation". Drops the misleading login-vs-compute framing and the dlcluster-specific row. - `.claude/skills/common/workspace-management.md` — drops stale pointer to the deleted e2e doc. - `.claude/skills/evaluation/SKILL.md` — workspace-integration intro trimmed; NEL CI section removed (content moved to Model-Optimizer-Internal). Monitoring block replaced via #1252 merge with joint trim text that routes to `monitor`, `launching-evals`, and `accessing-mlflow`. - `.claude/skills/ptq/SKILL.md` — "Next steps" block refined. - `.markdownlint-cli2.yaml` — excludes vendored upstream skills from markdownlint so they stay byte-identical to upstream. **Deleted files** - `.claude/skills/common/end-to-end-workflow.md` — per @kaix-nv and @mxinO: redundant with skill descriptions that already handle cross-skill routing. - `.claude/skills/evaluation/references/nel-ci-guide.md` — per @shengliangxu: NVIDIA-internal. Moved to `Model-Optimizer-Internal:agent/evaluation_guide.md` (MR !57). ### Review status | Reviewer | Concern | How addressed | |---|---|---| | @shengliangxu | `nel-ci-guide.md` contains NVIDIA-internal content | Moved to Model-Optimizer-Internal MR !57 (renamed `evaluation_guide.md` since it covers both NEL SLURM executor and NEL CI). Refreshed against current upstream NEL / NEL-CI (current cluster list, Sybil/non-Sybil distinction, prerequisites checklist, `nel-ci-cli` preferred trigger). | | @kaix-nv, @mxinO | e2e workflow doc unnecessary | Deleted. Skill descriptions already route between ptq / deployment / evaluation. | | @kaix-nv (#1252) | Overlap on `evaluation/SKILL.md` Monitoring section | Coordinated via comment on #1252. @kaix-nv incorporated the joint trim text referencing `monitor` + `launching-evals` + `accessing-mlflow`. Landed via #1252 merge. | | @mxinO | `remote-execution.md` compute-node framing is misleading | Reframed as workstation→cluster staging. | | @mxinO, CodeRabbit, Copilot | slurm-setup NGC creds aren't SLURM-specific; `$oauthtoken` literal clarification; overwrite safety | New `credentials.md` with NGC / HF / Docker setup. Append-if-missing pattern (`grep -q … \|\| echo … >>`). Explicitly calls out `$oauthtoken` as literal, kept unexpanded via single quotes. | | @mxinO | `credentials.md`: check what's already set first; document `hf auth login` | Added detection section at the top covering `HF_TOKEN`, `~/.cache/huggingface/token`, Docker config, enroot creds. HF section now documents `hf auth login` as the recommended interactive path; env-var path kept as option 2 for scripts/CI. | | @kevalmorabia97 | Internal `gitlab-master.nvidia.com` container URL in `launching-evals/SKILL.md` | Auto-fixed by bumping pinned SHA from `01899f8` to `8fa16b2` — upstream PR [NVIDIA-NeMo/Evaluator#920](NVIDIA-NeMo/Evaluator#920) already replaced it with the public `nvcr.io/nvidia/eval-factory/simple-evals:26.03`. SHA bump also picked up upstream's new `nemo-evaluator-launcher resume` command and a tighter "MANDATORY monitor after every `nel run`" directive. | | @kevalmorabia97 | Internal `PPP` terminology + `/lustre/fsw/portfolios/coreai/...` path | Still in upstream — vendoring verbatim means we can't sanitize locally without breaking the "verbatim" property. Filed upstream as [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938) to genericize. Next sync-script SHA bump will pick up the fix automatically. | ### Related - **Depends on:** #1236 (`deployment/references/unsupported-models.md`, merged) - **Coordinated with:** #1252 (monitor skill, merged — joint trim text incorporated) - **Internal counterpart:** [Model-Optimizer-Internal MR !57](https://gitlab-master.nvidia.com/omniml/Model-Optimizer-Internal/-/merge_requests/57) — `agent/evaluation_guide.md` - **Upstream coordination:** vendored skills synced from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) @ `8fa16b2`. Follow-up issue: [NVIDIA-NeMo/Evaluator#938](NVIDIA-NeMo/Evaluator#938). ### Motivation Learnings from running end-to-end PTQ → Deploy → Eval on Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster B100, plus prior NEL CI experience on oci-hsg. ### Testing Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations) → NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on `limit_samples=10`). ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ (documentation / skills only) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ - `.claude/skills/launching-evals/` and `.claude/skills/accessing-mlflow/` are vendored verbatim from [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) (Apache-2.0). Provenance SHA pinned in each `SKILL.md` frontmatter and in `.claude/scripts/sync-upstream-skills.sh`. - Did you write any new necessary tests?: N/A (skill documentation) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ --------- Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the internal
gitlab-master.nvidia.comexample container in thelaunching-evalsskill with a publicnvcr.iocontainer already used in the repo docs.Testing
Resolves EVAL-1311
Linear issue: https://linear.app/nvidia/issue/EVAL-1311/skills-remove-internal-gitlab-url-from-launching-evals-skill