Skip to content

qualify: SafeTensors non-LLaMA architecture support (GPT-2, GPT-NeoX, OPT, StarCoder, BERT) #311

@noahgift

Description

@noahgift

Description

SafeTensors models using non-LLaMA architectures fail the Check gate because the tensor name mapping only recognizes LLaMA-style weight names.

Affected Models (all 10/11 or worse)

  • BERT/encoder: BAAI/bge-small-en-v1.5, sentence-transformers/all-MiniLM-L6-v2
  • GPT-2: openai-community/gpt2, openai-community/gpt2-medium (missing config.json)
  • GPT-NeoX: EleutherAI/gpt-neo-125m, EleutherAI/pythia-410m-deduped
  • OPT: facebook/galactica-125m
  • PhiForCausalLM: microsoft/phi-1.5
  • StarCoder: bigcode/tiny-starcoder_py, bigcode/starcoder2-3b

Error Patterns

  • Tensor not found with names: 'model.embed_tokens.weight', 'token_embd.weight', or 'embed_tokens.weight'
  • config.json not found (required for SafeTensors inference)
  • config.json missing num_attention_heads

Root Cause

The SafeTensors loader in realizar only maps LLaMA-style tensor names. Each architecture family uses different naming:

  • GPT-2: transformer.h.N.attn.c_attn.weight
  • GPT-NeoX: gpt_neox.layers.N.attention.query_key_value.weight
  • OPT: model.decoder.layers.N.self_attn.k_proj.weight
  • BERT: encoder.layer.N.attention.self.query.weight
  • StarCoder: model.layers.N.self_attn.o_proj.bias
  • Phi: model.layers.N.self_attn.dense.weight

Expected Behavior

Add architecture-detection and weight-name mapping for major model families.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions