Context
The cookbook architecture-demos spec tracks tiny_starcoder_py as status: blocked because GPTBigCodeForCausalLM has no upstream loader. GPTBigCode is StarCoder-1's architecture (multi-query attention with a single shared K/V across all heads), distinct from StarCoder2 (which has grouped-query attention; tracked separately).
Family
- Name: tiny_starcoder_py (canonical loader: gptbigcode)
- Vendor: BigCode
- HF architectures:
GPTBigCodeForCausalLM
- HF pattern:
bigcode/tiny_starcoder_py, bigcode/starcoder (StarCoder-1)
- Reference checkpoints:
bigcode/tiny_starcoder_py (164M, fits in CI smoke), bigcode/starcoderbase-1b
Acceptance criteria
Unblock impact
- Cookbook manifest flips from
blocked to certified
- 164M variant is the smallest causal-LM in the supported set — useful for fast smoke tests
Cookbook reference
Context
The cookbook architecture-demos spec tracks
tiny_starcoder_pyasstatus: blockedbecause GPTBigCodeForCausalLM has no upstream loader. GPTBigCode is StarCoder-1's architecture (multi-query attention with a single shared K/V across all heads), distinct from StarCoder2 (which has grouped-query attention; tracked separately).Family
GPTBigCodeForCausalLMbigcode/tiny_starcoder_py,bigcode/starcoder(StarCoder-1)bigcode/tiny_starcoder_py(164M, fits in CI smoke),bigcode/starcoderbase-1bAcceptance criteria
contracts/model-families/gptbigcode.yamlexists (ortiny_starcoder_py.yamlif scoped narrower)multi_query: true)multi_query: true, GPT-2 has standard MHA)tiny_starcoder_py(164M is small enough for fast CI)Unblock impact
blockedtocertifiedCookbook reference
name: tiny_starcoder_pyblock