QA: Granite-3.1-2B has 3-test failure pattern (89.3% pass rate)

## Summary

IBM Granite-3.1-2B-Instruct achieves **89.3% pass rate** (25/28 tests) during MVP certification. Close to the 90% threshold but 3 specific test matrix cells fail consistently.

## Evidence

| Model | Pass Rate | Tests | Source |
|-------|-----------|-------|--------|
| Granite-3.1-2B-Instruct | 89.3% | 25/28 | `apr-model-qa-playbook/certifications/granite-3.1-2b-instruct/evidence.json` |

## Context

Granite uses a LLaMA-derivative architecture. The 89.3% pass rate suggests most of the stack works but a few specific format/modality combinations have issues. Investigating the 3 failing evidence entries should reveal a targeted fix.

## Reproduction

```bash
cd ../apr-model-qa-playbook
cargo run --release --bin apr-qa -- run playbooks/models/granite-3.1-2b-mvp.playbook.yaml \
    --output certifications/granite-3.1-2b-instruct --no-integrity-check --skip-conversion-tests \
    --no-differential --no-trace-payload --timeout 120000
```

## Expected Behavior

Should achieve >=90% pass rate. The 3 failing tests likely share a common root cause.

## Labels

Architecture support, Granite, P2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA: Granite-3.1-2B has 3-test failure pattern (89.3% pass rate) #230

Summary

Evidence

Context

Reproduction

Expected Behavior

Labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

QA: Granite-3.1-2B has 3-test failure pattern (89.3% pass rate) #230

Description

Summary

Evidence

Context

Reproduction

Expected Behavior

Labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions