Skip to content

fix(skills/comfyui): bug fixes, cloud parity, expanded coverage, examples, tests#17682

Closed
SHL0MS wants to merge 2 commits into
NousResearch:mainfrom
SHL0MS:audit-comfyui-skill
Closed

fix(skills/comfyui): bug fixes, cloud parity, expanded coverage, examples, tests#17682
SHL0MS wants to merge 2 commits into
NousResearch:mainfrom
SHL0MS:audit-comfyui-skill

Conversation

@SHL0MS

@SHL0MS SHL0MS commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

A comprehensive audit of the v4.1 ComfyUI skill surfaced ~70 issues across the
five scripts and three reference docs. v5.0.0 fixes them and fills the gaps
that previously forced users to write their own glue around the skill.

This is a large PR — apologies. Most of the issues are coupled (a single
_common.py for HTTP/cloud-routing/catalogs eliminates duplication, but it
ripples through every script that imported them). All changes live in
skills/creative/comfyui/; nothing else is touched.

Every fix is verified end-to-end against the live Comfy Cloud API (via a
free-tier key) — the cloud endpoint differences turned out to be the biggest
hidden bug class in the original.

Why now

The v4.1 skill works for happy-path local SD1.5 / SDXL submission, but breaks
in real production scenarios:

  • Path traversal + silent overwrites: server-supplied filenames written
    directly to output_dir / filename. Two outputs with the same filename in
    different subfolders collided. A malicious workflow could ../../etc/....
  • Status misclassification: entry["status"]["completed"] is True even
    on errored runs; the original checked completed first, so failed runs were
    reported as success.
  • API key leak on Cloud /api/view: cloud returns 302 → signed S3 URL.
    The original session hook tried to strip X-API-Key on cross-host redirect,
    but requests doesn't re-read session headers when building a redirect
    request, so the key was still being sent to the storage backend.
  • Cloud endpoint mismatches: /api/object_info, /api/queue, /api/upload,
    /api/prompt, /api/view are all paid-only on free tier (return 403).
    /api/history was renamed to /api/history_v2. /api/models/<f> was renamed
    to /api/experiment/models/<f>. unet/ and clip/ folders no longer exist
    on cloud — moved to diffusion_models/ and text_encoders/. The original
    skill's check_deps reported every Flux model as missing on cloud as a result.
  • Asymmetric duplicate-name resolution in extract_schema: first occurrence
    of a duplicated friendly name kept the bare name, second got _node_id
    suffix, third also got _node_id (potentially colliding). Now symmetric.
  • Embedding refs in prompt strings (embedding:goodvibes:1.2) were never
    detected as model dependencies — workflows would silently fail with
    "embedding not found" at runtime.

Critical fixes

Issue Fix
poll_status reports failed runs as success Check status_str==\"error\" BEFORE completed:true
download_output path traversal New safe_path_join() containment guard
Subfolder collisions silently overwrite Preserve server subfolder structure (or --flat-output opt-out); auto-suffix _1/_2 on collision
X-API-Key leak to S3 on redirect _StripSensitiveOnRedirectSession subclasses Session.rebuild_auth to drop X-API-Key/Authorization/Cookie on cross-host hop. Tested.
Cloud detection asymmetry between scripts Single is_cloud_host() in _common.py (handles *.comfy.org subdomains)
Cloud endpoint renames not handled resolve_url() maps /history → /history_v2, /models/<f> → /experiment/models/<f>
Folder rename unet → diffusion_models FOLDER_ALIASES consulted by check_deps so Flux deps resolve on cloud
404 conflated with empty folder Distinguished via code: folder_not_found body marker
extra_data.api_key_comfy_org polluted with auth key Only set when --partner-key explicitly passed
Whole-body buffered downloads (OOM on video) Streaming via 64 KiB chunks
No retry on transient errors Exponential backoff + jitter on 408/429/5xx + connection errors
Negative-prompt heuristic title-only Connection tracing through Reroute / Primitive nodes
Apple gen defaulted to 1 (false M1 tag) Defaults to None; Rosetta detection added
Single-GPU detection only Multi-GPU pick-best-VRAM
WSL2 / disk-space / PowerShell Detected; `wmic` deprecated path replaced
pip install comfy-cli pollutes system Python Setup script prefers pipx → uvx → `pip --user` (with PEP-668 fallback)
Setup not idempotent Skips relaunch if server already up; configurable port/workspace; SIGINT trap; persistent log
Editor-format detection misses wrappers unwrap_workflow() handles {\"prompt\": {...}} shape

New scripts

Script Purpose
_common.py Shared HTTP transport, cloud routing, node catalogs (eliminated drift)
run_batch.py --count N --randomize-seed or --sweep '{...}'; parallel up to tier limit
ws_monitor.py Real-time WebSocket viewer; saves preview frames to disk
auto_fix_deps.py Runs comfy node install / comfy model download for whatever check_deps reports missing (with --dry-run)
health_check.py Single command runs the verification checklist; optional --smoke-test submits a tiny workflow then cancels (no compute waste)
fetch_logs.py Pull traceback / status messages for a given prompt_id

Coverage expansion

  • Param patterns: Flux (BasicScheduler/Guider/RandomNoise/ModelSamplingFlux),
    SD3, Wan / Hunyuan / LTX video, IPAdapter, rgthree, easy-use, AnimateDiff
  • Embedding refs in prompt strings extracted as model deps
  • Model selectors (ckpt_name, vae_name, lora_name, unet_name, etc.)
    now controllable so workflows can be retargeted per run via --args

Examples

workflows/: sd15_txt2img.json, sdxl_txt2img.json, flux_dev_txt2img.json,
sdxl_img2img.json, sdxl_inpaint.json, upscale_4x.json,
animatediff_video.json, wan_video_t2v.json + a README.

Tests

117 passed in 2.19s — 105 unit + 8 cloud integration + 4 cross-host security.

Cloud integration tests auto-skip without COMFY_CLOUD_API_KEY. They were run
against the live Comfy Cloud API during development, including verification
that the rename map and folder aliases produce 0 missing deps for the example
workflows that target cloud-resident models.

$ python3 -m pytest tests/ -c tests/pytest.ini -o addopts=\"-p no:xdist\"
117 passed in 2.19s

Backwards compatibility

All existing CLI flags and behavior preserved. New behavior is opt-in:
--ws, --input-image, --randomize-seed, --flat-output, --partner-key,
--api-key (env var fallback), --submit-only. Any script invocation that
worked under v4.1 continues to work unchanged.

What I deliberately deferred

  • /object_info-driven runtime param validation (would require object_info on
    every run; cloud free-tier returns 403)
  • Workflow URL imports from civitai / comfyworkflows (out of scope)
  • LLM-driven prompt enhancement (not core to the skill)
  • extra_model_paths.yaml awareness (server-side concern; check_deps trusts
    what the server enumerates)

Happy to split into smaller commits, drop any of the new scripts, or scope
back if any of this is too much. The bug fixes in run_workflow.py and
_common.py (path traversal, status misclassification, X-API-Key leak,
streaming downloads, cloud routing) are the safety-critical core if you want
to take only those.

…ples, tests

The audit of v4.1 surfaced ~70 issues across the five scripts and three
reference docs — most user-visible (silent file overwrites, status-error
misclassified as success, X-API-Key leaked to S3 on /api/view redirect,
Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those
and fills the gaps that previously forced users to write their own glue
(WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix,
log fetch, health check, example workflows).

Critical fixes
- run_workflow.py: poll_status now checks status_str==error BEFORE
  completed:true, so a failed run no longer reports success
- run_workflow.py: download_output streams to disk via safe_path_join,
  preserves server subfolder structure (no silent overwrites), and
  retries with exponential backoff
- run_workflow.py: refuses to overwrite a link with a literal in
  inject_params (would silently break wiring)
- _common.py: _StripSensitiveOnRedirectSession (subclasses
  requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host
  redirects — fixes a real key-leak path through Cloud's signed-URL
  download flow. Tested
- Cloud routing (verified live): /history → /history_v2,
  /models/<f> → /experiment/models/<f>, plus folder aliases for the
  unet ↔ diffusion_models and clip ↔ text_encoders rename
- check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs
  403 free-tier; emits concrete fix_command per missing dep
- extract_schema.py: prompt vs negative_prompt determined by tracing
  KSampler.{positive,negative} connections (incl. through Reroute /
  Primitive nodes) instead of meta-title heuristic; symmetric
  duplicate-name resolution; cycle-safe trace_to_node
- hardware_check.py: multi-GPU pick-best, Apple variant detection,
  Rosetta detection, WSL2, ROCm --json, disk-space check, optional
  PyTorch probe; powershell preferred over deprecated wmic
- comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668
  fallback); idempotent — skips relaunch if server already up;
  configurable port/workspace; persistent log; SIGINT trap

New scripts
- run_batch.py — count or sweep (cartesian product), parallel up to
  cloud tier limit
- ws_monitor.py — real-time WebSocket viewer; saves preview frames
- auto_fix_deps.py — runs comfy node install / model download for
  whatever check_deps reports missing (with --dry-run)
- health_check.py — single command that runs the verification checklist
  (comfy-cli + server + checkpoints + optional smoke test that cancels
  itself to avoid burning compute)
- fetch_logs.py — pull traceback / status messages for a prompt_id

Coverage expansion
- Param patterns now cover Flux (BasicScheduler, BasicGuider,
  RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video,
  IPAdapter, rgthree, easy-use, AnimateDiff
- Embedding refs in CLIPTextEncode strings extracted as model deps
- ckpt_name / vae_name / lora_name / unet_name now controllable so
  workflows can be retargeted per run

Examples
- workflows/{sd15,sdxl,flux_dev}_txt2img.json
- workflows/sdxl_{img2img,inpaint}.json
- workflows/upscale_4x.json
- workflows/{animatediff_video,wan_video_t2v}.json + README

Tests
- 117 tests (105 unit + 8 cloud integration + 4 cross-host security)
- Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end
  against live cloud API

Backwards compatibility
- All existing CLI flags continue to work; new behavior is opt-in
  (--ws, --input-image, --randomize-seed, --flat-output, etc.)
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

@christian-byrne Tagging you on this ComfyUI item.

1 similar comment
@alt-glitch

Copy link
Copy Markdown
Collaborator

@christian-byrne Tagging you on this ComfyUI item.

Self-review caught several errors in the previous commit:

Frontmatter
- Replace non-standard `requires_runtime` / `requires_tooling` fields with
  the documented `compatibility:` field (parsed by tools/skills_tool.py).
- Drop the `audit-v5` author tag I added unnecessarily.

MODEL_LOADERS catalog
- Remove `IPAdapterUnifiedLoader` (input `preset` is an enum, not a file).
- Remove `IPAdapterInsightFaceLoader` and `InsightFaceLoader` (input
  `provider` is a GPU backend selector, not a model file). These would have
  flagged enum values like "STANDARD" or "CUDA" as missing model files.
- Add "NB:" comment explaining `BasicGuider` has no `cfg` input
  (the original PARAM_PATTERNS entry would never have matched).
- Remove `SamplerCustomAdvanced.noise_seed` from PARAM_PATTERNS — that
  node takes a NOISE input from RandomNoise, not a seed field directly.

NODE_TO_PACKAGE registry slugs
- Verified all 18 packages against api.comfy.org and fixed:
  - `comfyui-essentials` → `comfyui_essentials` (underscore, not hyphen)
  - `comfyui-gguf` → `ComfyUI-GGUF` (case-sensitive)
  - `comfyui-photomaker-plus` → `ComfyUI-PhotoMaker-Plus`
  - `comfyui-wanvideowrapper` → `ComfyUI-WanVideoWrapper`
- ComfyUI-HunyuanVideoWrapper isn't on the registry; surface a git-URL
  install hint via new NODE_TO_GIT_URL fallback so the user can install
  via ComfyUI-Manager's /manager/queue/install endpoint.

Wrong class names
- `Canny` → `CannyEdgePreprocessor` (controlnet-aux registers the latter,
  the former never appears in /object_info).
- Add `Zoe_DepthAnythingPreprocessor` and `AnimalPosePreprocessor` while
  fixing controlnet-aux.
- Remove `Reroute (rgthree)` (rgthree's Reroute is JS-only — no Python
  class, never appears in /object_info).
- Add `Display Int (rgthree)` (sibling of Display Any).
- Move `UltralyticsDetectorProvider` from `comfyui-impact-pack` to
  `comfyui-impact-subpack` (separate package, registered there).

Tests
- Update test_packages_are_safe_for_shell to accept case-mixed slugs (the
  registry uses both ComfyUI- and comfyui_ prefixes inconsistently). Replaced
  the lowercase-only assertion with a shell-safe regex check.
- 117 tests still pass (105 unit + 8 cloud + 4 cross-host).

Attribution
- Add `SHL0MS@users.noreply.github.com` mapping to scripts/release.py
  AUTHOR_MAP so check-attribution CI passes.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #17734 — your two commits were cherry-picked onto current main with your authorship preserved in git log (a7780fe, 51b44b6). Thanks for the thorough audit and the end-to-end Cloud verification — the X-API-Key leak and status-misclassification fixes are especially appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants