Custom Node Testing
Expected Behavior
GGUF Qwen models (e.g., Q4_K_M) should run with the --fast argument and not crash.
Actual Behavior
Even smaller GGUF Qwen models (e.g., Q4_K_M) that have run previously now produce the following error when run with the --fast argument or --fast pinned_memory argument:
KSampler
CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I'm aware the --fast argument "enables some untested and potentially quality deteriorating optimizations". The culprit appears to be the pinned_memory optimization.
Steps to Reproduce
Launch ComfyUI with --fast or --fast pinned_memory argument. Run a simple workflow that includes a GGUF Unet loader node. Notice the (likely) CUDA crash.
Debug Logs
Other
No response
Custom Node Testing
Expected Behavior
GGUF Qwen models (e.g., Q4_K_M) should run with the
--fastargument and not crash.Actual Behavior
Even smaller GGUF Qwen models (e.g., Q4_K_M) that have run previously now produce the following error when run with the
--fastargument or--fast pinned_memoryargument:KSampler
I'm aware the
--fastargument "enables some untested and potentially quality deteriorating optimizations". The culprit appears to be thepinned_memoryoptimization.Steps to Reproduce
Launch ComfyUI with
--fastor--fast pinned_memoryargument. Run a simple workflow that includes a GGUF Unet loader node. Notice the (likely) CUDA crash.Debug Logs
--Other
No response