Prerequisites
Feature Description
Z.ai released a new 30B MoE GLM-4.7-Flash which has the architecture defined as Glm4MoeLiteForCausalLM.
Motivation
As a small alternative to Z.ai's flagship model, it would be great to have GGUF support for this model as well!
Possible Implementation
Reference implementations:
Transformers PR
vLLM PR
Prerequisites
Feature Description
Z.ai released a new 30B MoE GLM-4.7-Flash which has the architecture defined as
Glm4MoeLiteForCausalLM.Motivation
As a small alternative to Z.ai's flagship model, it would be great to have GGUF support for this model as well!
Possible Implementation
Reference implementations:
Transformers PR
vLLM PR