Skip to content

feat: add AllenAI OLMo (OlmoForCausalLM, Olmo2ForCausalLM) loader to aprender::rosetta #1591

@noahgift

Description

@noahgift

Context

The cookbook architecture-demos spec tracks OLMo as status: blocked. No prior aprender issue covers this loader; OLMo's distinguishing feature is full reproducibility — Allen AI ships training data and intermediate checkpoints, making it valuable as a teaching/research target.

Family

  • Name: olmo
  • Vendor: AllenAI
  • HF architectures: OlmoForCausalLM, Olmo2ForCausalLM
  • HF pattern: allenai/OLMo-*
  • Reference checkpoints: allenai/OLMo-1B-hf, allenai/OLMo-7B-hf, allenai/OLMo-2-1124-7B

Acceptance criteria

  • contracts/model-families/olmo.yaml exists with size_variants for 1B and 7B (covers both Olmo-1 and Olmo-2 architectures via Olmo2ForCausalLM)
  • Loader handles OLMo-2's QK normalization (qk_norm boolean field)
  • Discriminator distinguishes OLMo from Llama (OLMo has non_parametric_layer_norm or qk_norm)
  • Inference smoke pass against OLMo-1B-hf (smallest variant)

Unblock impact

  • Cookbook manifest flips from blocked to certified
  • Adds a fully-reproducible-training family to the cookbook for research demos

Cookbook reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions