Prerequisites
Feature Description
Running b7530
https://huggingface.co/Maincode/Maincoder-1B
https://www.reddit.com/r/LocalLLaMA/comments/1puf614/new_1b_parameter_opensource_coding_model_getting/
As an enhancement, I would expect llama.cpp to support the architecture for running Maincoder models.
Motivation
I believe that the addition of the Maincoder model architecture would be a very helpful addition to llama.cpp.
- The current model is a 1b, which scores very well on benchmarks for its size.
- It's Ideal for ultra-low latency Fill-In-The-Middle (FIM) and local IDE completion on any hardware.
- It Offers high-quality QA and coding assistance (For it's size) at a size that runs smoothly on CPUs and mobile devices.
- Can run locally or on constrained hardware
Possible Implementation
https://huggingface.co/Maincode/Maincoder-1B
Maincoder uses a modern transformer decoder architecture with:
Rotary Position Embeddings: With theta of 1,000,000.
RMSNorm: Pre-normalization for stable training.
Grouped Query Attention: 4:1 ratio of query to key-value heads.
QK Normalization: RMSNorm applied to attention queries and keys.
SwiGLU MLP: Gated linear units with SiLU activation.

Prerequisites
Feature Description
Running b7530
https://huggingface.co/Maincode/Maincoder-1B
https://www.reddit.com/r/LocalLLaMA/comments/1puf614/new_1b_parameter_opensource_coding_model_getting/
As an enhancement, I would expect llama.cpp to support the architecture for running Maincoder models.
Motivation
I believe that the addition of the Maincoder model architecture would be a very helpful addition to llama.cpp.
Possible Implementation
https://huggingface.co/Maincode/Maincoder-1B
Maincoder uses a modern transformer decoder architecture with:
Rotary Position Embeddings: With theta of 1,000,000.
RMSNorm: Pre-normalization for stable training.
Grouped Query Attention: 4:1 ratio of query to key-value heads.
QK Normalization: RMSNorm applied to attention queries and keys.
SwiGLU MLP: Gated linear units with SiLU activation.