Skip to content

feat: add StarCoder2 (Starcoder2ForCausalLM) loader to aprender::rosetta #1593

@noahgift

Description

@noahgift

Context

The cookbook architecture-demos spec tracks StarCoder2 as status: blocked. Issue #311 ("qualify: SafeTensors non-LLaMA architecture support (GPT-2, GPT-NeoX, OPT, StarCoder, BERT)", closed 2026-02-27) added SafeTensors handling for several non-LLaMA architectures but didn't ship a StarCoder2-specific loader YAML.

Family

  • Name: starcoder2
  • Vendor: BigCode
  • HF architectures: Starcoder2ForCausalLM
  • HF pattern: bigcode/starcoder2-*
  • Reference checkpoints: bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b

Acceptance criteria

  • contracts/model-families/starcoder2.yaml exists with size_variants for 3b, 7b, 15b
  • Loader handles StarCoder2's grouped-query attention (n_kv_heads typically 1/4 of n_heads)
  • Loader handles RoPE with rope_theta=1000000 (similar to Llama-3 long-context)
  • Discriminator field documented (Starcoder2ForCausalLM architecture string + GQA pattern)
  • Inference smoke pass against starcoder2-3b (smallest variant)

Unblock impact

Cookbook reference

  • manifest.yaml entryname: starcoder2 block
  • Predecessor: aprender#311 (added SafeTensors but not the loader)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions