SafeTensors inference produces garbage: layer count misdetection (14 vs 24)

## Summary

SafeTensors inference on Qwen2.5-Coder-0.5B-Instruct produces garbage output while GGUF inference on the same model produces correct output. Root cause appears to be **incorrect layer count detection** in the SafeTensors loader.

**apr version:** 0.2.12 (commit f7f7ca8b)
**Model:** Qwen/Qwen2.5-Coder-0.5B-Instruct
**QA Gate:** F-QUAL-001 (Garbage Output Detection)

## Reproduction

```bash
# SafeTensors - GARBAGE OUTPUT
apr run --prompt "What is 2+2?" --max-tokens 10 \
  /path/to/safetensors/model.safetensors --verbose

# Output:
# Architecture: SafeTensors (14 layers, vocab_size=151936)  # <-- WRONG: should be 24
# Output: çī¹åĪ«æĺ¯åľ¨âĢĶevenâĢĶevenâĢĶevenâĢĶallthoughâĢĶeveninders-associated/dis

# GGUF - CORRECT OUTPUT
apr run --prompt "What is 2+2?" --max-tokens 10 \
  /path/to/gguf/model.gguf --verbose

# Output:
# Architecture: Qwen2 [GGUF: qwen2] (24 layers, vocab_size=151936)  # <-- CORRECT
# Output: 2 + 2 equals 4.
```

## Root Cause Analysis

The SafeTensors loader detects **14 layers** but the model actually has **24 layers**.

### Evidence: Verbose Output Comparison

| Format | Detected Layers | Actual Layers | Output Quality |
|--------|-----------------|---------------|----------------|
| SafeTensors | 14 | 24 | Garbage |
| GGUF | 24 | 24 | Correct |

### SafeTensors Verbose Output
```
Source: /home/noah/.cache/pacha/models/qwen2-5-coder-0-5b-instruct/safetensors/model.safetensors
Using mmap for 942MB model
Loading SafeTensors model: ...
Architecture: SafeTensors (14 layers, vocab_size=151936)   # BUG: Wrong layer count
Config: hidden_size=896, context_length=32768, quant=F16/BF16, threads=1 (GPU)
Model loaded in 2029.7ms
Backend: GPU (NVIDIA GeForce RTX 4090, 24045 MB VRAM)
```

### GGUF Verbose Output
```
Source: /home/noah/.cache/pacha/models/qwen2-5-coder-0-5b-instruct/gguf/model.gguf
Using mmap for 468MB model
Loading model: ...
Architecture: Qwen2 [GGUF: qwen2] (24 layers, vocab_size=151936)  # Correct
Config: hidden_size=896, context_length=32768, quant=Q8_0, threads=48
Model loaded in 545.4ms
Backend: CPU (Q4_0 format - GPU Q4_K kernels incompatible)
```

### Tensor Verification

Both files have the same layer structure:

```bash
# SafeTensors layers (verified via apr tensors | grep model.layers | unique)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  # 24 layers

# GGUF layers (from rosetta inspect metadata)
n_layers: 24
```

## Hypothesis

The SafeTensors loader is incorrectly calculating the layer count. Possible causes:

1. **Off-by-one in layer counting loop** - stops at layer 13 instead of 23
2. **Hardcoded assumption** - assuming half the layers based on some heuristic
3. **Metadata parsing bug** - misreading config.json num_hidden_layers field
4. **Tensor name pattern mismatch** - not recognizing layers 14-23 naming convention

## Impact

- **Qwen2.5-Coder-0.5B MVP certification BLOCKED** at MQS 270 (was targeting 800+)
- All 6 SafeTensors inference tests fail (3 modalities × 2 backends)
- GGUF and APR inference pass correctly

## Model File Details

```
SafeTensors:
  File Size: 988097824 bytes (943 MB)
  Total Parameters: 494032768
  Tensors: 290
  Data Type: BF16

GGUF:
  File Size: 491400064 bytes (468 MB)
  Total Parameters: 630167424
  Tensors: 291
  Quantization: Q8_0
  Architecture: qwen2
  n_layers: 24
```

## Environment

- GPU: NVIDIA GeForce RTX 4090 (24GB VRAM)
- OS: Linux 6.8.0-90-generic
- apr version: 0.2.12

## Related

- GH-196: Qwen2.5-Coder-0.5B MVP certification blocked
- PMAT-094: SafeTensors inference produces garbage output


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SafeTensors inference produces garbage: layer count misdetection (14 vs 24) #197

Summary

Reproduction

Root Cause Analysis

Evidence: Verbose Output Comparison

SafeTensors Verbose Output

GGUF Verbose Output

Tensor Verification

Hypothesis

Impact

Model File Details

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SafeTensors inference produces garbage: layer count misdetection (14 vs 24) #197

Description

Summary

Reproduction

Root Cause Analysis

Evidence: Verbose Output Comparison

SafeTensors Verbose Output

GGUF Verbose Output

Tensor Verification

Hypothesis

Impact

Model File Details

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions