GPU parity FAILED: cosine=-0.005 for Qwen2.5-7B GQA (hidden=3584, heads=28, kv=4) on sm_121

## Root Cause (Five-Whys)

**CORRECTNESS-011 layer trace output:**
```
PARITY-GATE FAILED: GPU computes a DIFFERENT function than CPU.
Cosine similarity: -0.005182 (required: ≥0.98)
CPU argmax: 334 | GPU argmax: 8127
Max absolute logit difference: 19.5080
```

1. Why GPU ≠ CPU? Cosine similarity -0.005 (completely uncorrelated, not FP rounding)
2. Why completely wrong? GPU forward pass computes an entirely different function
3. Why different function? Model dimensions `hidden=3584, heads=28, kv_heads=4` (GQA)
4. Which kernel? GQA attention with 28:4 head ratio (7 query heads per KV head)
5. Root cause: GQA CUDA kernel handles non-power-of-2 head ratios incorrectly

## Diagnosis

- This is NOT FP rounding (cosine would be ~0.999)
- This is NOT a driver issue (driver 590 same result as 580)
- This is a **logic bug** in the GQA attention kernel for head_ratio=7
- The cosine of -0.005 means GPU output is essentially random relative to CPU

## Model Dimensions

- hidden_dim: 3584
- num_heads: 28
- num_kv_heads: 4
- head_dim: 128 (3584/28)
- head_ratio: 7 (28/4) — non-power-of-2

## Hardware

NVIDIA GB10 (Blackwell sm_121), driver 590.48.01, CUDA 13.1

## Provable Contract

`ptx-target-parity-v1.yaml` equation `target_parity` is violated.
`gpu-context-health-v1.yaml` — the GPU produces wrong results, not just context poisoning.

## Fix Required

Fix the GQA attention kernel in trueno-gpu for head_ratio=7 on sm_121.
Verify with CORRECTNESS-011 layer trace: cosine must be ≥0.98.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU parity FAILED: cosine=-0.005 for Qwen2.5-7B GQA (hidden=3584, heads=28, kv=4) on sm_121 #559

Root Cause (Five-Whys)

Diagnosis

Model Dimensions

Hardware

Provable Contract

Fix Required

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GPU parity FAILED: cosine=-0.005 for Qwen2.5-7B GQA (hidden=3584, heads=28, kv=4) on sm_121 #559

Description

Root Cause (Five-Whys)

Diagnosis

Model Dimensions

Hardware

Provable Contract

Fix Required

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions