Skip to content

feat: add GPTBigCode (GPTBigCodeForCausalLM) loader to aprender::rosetta — covers tiny_starcoder_py #1594

@noahgift

Description

@noahgift

Context

The cookbook architecture-demos spec tracks tiny_starcoder_py as status: blocked because GPTBigCodeForCausalLM has no upstream loader. GPTBigCode is StarCoder-1's architecture (multi-query attention with a single shared K/V across all heads), distinct from StarCoder2 (which has grouped-query attention; tracked separately).

Family

  • Name: tiny_starcoder_py (canonical loader: gptbigcode)
  • Vendor: BigCode
  • HF architectures: GPTBigCodeForCausalLM
  • HF pattern: bigcode/tiny_starcoder_py, bigcode/starcoder (StarCoder-1)
  • Reference checkpoints: bigcode/tiny_starcoder_py (164M, fits in CI smoke), bigcode/starcoderbase-1b

Acceptance criteria

  • contracts/model-families/gptbigcode.yaml exists (or tiny_starcoder_py.yaml if scoped narrower)
  • Loader handles GPTBigCode's MQA: single shared K/V across all attention heads (multi_query: true)
  • Discriminator distinguishes GPTBigCode from GPT-2 (GPTBigCode has multi_query: true, GPT-2 has standard MHA)
  • Inference smoke pass against tiny_starcoder_py (164M is small enough for fast CI)

Unblock impact

  • Cookbook manifest flips from blocked to certified
  • 164M variant is the smallest causal-LM in the supported set — useful for fast smoke tests

Cookbook reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions