Summary
apr serve run <model.apr> --gpu hangs during model loading for large Q4K APR files (17 GB, 18,867 tensors). The same model loads in 12 seconds via realizar serve --model <path> --gpu.
Root Cause
apr serve run → start_apr_server_gpu → OwnedQuantizedModel::from_apr() uses the generic mmap+conversion path, which doesn't have the ALB-098 pool allocator optimization that realizar serve uses.
realizar serve --gpu detects Q4K APR and uses the dedicated GPU Q4K executor with pool allocation (single cuMemAlloc for all 18,673 quantized tensors), giving 12s load time and 15 tok/s decode.
Reproduction
# Hangs (apr serve run):
apr serve run /mnt/nvme-raid0/models/qwen3-coder-30b-q4k.apr --gpu --port 8091
# Works (realizar serve):
realizar serve --model /mnt/nvme-raid0/models/qwen3-coder-30b-q4k.apr --gpu --port 8091
Impact
Blocks apr distill --stage generate (GH-455) from using apr serve run internally. Currently must use realizar serve subprocess as workaround.
Expected Fix
Route Q4K APR models through realizar's GPU Q4K executor (same path as realizar serve --gpu) instead of the generic OwnedQuantizedModel::from_apr() path.
Refs #455
Summary
apr serve run <model.apr> --gpuhangs during model loading for large Q4K APR files (17 GB, 18,867 tensors). The same model loads in 12 seconds viarealizar serve --model <path> --gpu.Root Cause
apr serve run→start_apr_server_gpu→OwnedQuantizedModel::from_apr()uses the generic mmap+conversion path, which doesn't have the ALB-098 pool allocator optimization thatrealizar serveuses.realizar serve --gpudetects Q4K APR and uses the dedicated GPU Q4K executor with pool allocation (single cuMemAlloc for all 18,673 quantized tensors), giving 12s load time and 15 tok/s decode.Reproduction
Impact
Blocks
apr distill --stage generate(GH-455) from usingapr serve runinternally. Currently must userealizar servesubprocess as workaround.Expected Fix
Route Q4K APR models through realizar's GPU Q4K executor (same path as
realizar serve --gpu) instead of the genericOwnedQuantizedModel::from_apr()path.Refs #455