feat(stablediffusion-ggml): LTX-2 support + LTX-2.3 GGUF gallery entries#9980
Merged
Conversation
stable-diffusion.cpp gained LTX-2 video generation, which requires an audio VAE and an embeddings_connectors safetensors in addition to the usual diffusion model, VAE, and LLM text encoder. The pinned commit exposes audio_vae_path and embeddings_connectors_path on sd_ctx_params_t; wire both through the option parser so gallery entries can point at the LTX-specific assets. Ship six LTX-2.3 GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0 each) backed by a new ltx-ggml.yaml template that defaults to euler / cfg_scale 6.0 / vae_decode_only:false / diffusion_flash_attn / offload_params_to_cpu — matching the upstream LTX-2 CLI recipe. Each entry pulls the model GGUF plus the QAT gemma-3-12b-it text encoder, video VAE, audio VAE, and embeddings connectors needed for T2V / I2V / FLF2V. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude-Code]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
audio_vae_pathandembeddings_connectors_paththroughbackend/go/stablediffusion-ggml/cpp/gosd.cppso the upstream LTX-2 fields onsd_ctx_params_t(added in the currently pinned commit) are reachable from gallery entries.gallery/ltx-ggml.yamltemplate config matching the upstream LTX-2 CLI recipe:stablediffusion-ggmlbackend, sampler=euler, cfg_scale=6.0, step=30,vae_decode_only:false,diffusion_flash_attn:true,offload_params_to_cpu:true,diffusion_modelflag.gemma-3-12b-it-qat-UD-Q4_K_XL.gguftext encoder + video VAE + audio VAE + embeddings_connectors safetensors. SHA256s fetched via the HFx-linked-etagmethod.Upstream LTX-2 stable-diffusion.cpp doc: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/ltx2.md
Test plan
stablediffusion-ggmlbackend locally and confirm the new options are parsed (look forFound audio_vae_path/Found embeddings_connectors_pathstyle log lines once a gallery entry is installed)ltx-2.3-22b-distilled-ggmlfrom the gallery (lowest-footprint variant) and run a short T2V request: width=1280, height=720, video_frames=33, fps=24init_imageto exercise the I2V pathinit_image+end_imagefor the FLF2V pathgallery/index.yamlround-trips through the gallery loader (994 entries; LTX-2.3 GGUF entries visible in the React UI gallery)