Skip to content

QA: Granite-3.1-2B has 3-test failure pattern (89.3% pass rate) #230

@noahgift

Description

@noahgift

Summary

IBM Granite-3.1-2B-Instruct achieves 89.3% pass rate (25/28 tests) during MVP certification. Close to the 90% threshold but 3 specific test matrix cells fail consistently.

Evidence

Model Pass Rate Tests Source
Granite-3.1-2B-Instruct 89.3% 25/28 apr-model-qa-playbook/certifications/granite-3.1-2b-instruct/evidence.json

Context

Granite uses a LLaMA-derivative architecture. The 89.3% pass rate suggests most of the stack works but a few specific format/modality combinations have issues. Investigating the 3 failing evidence entries should reveal a targeted fix.

Reproduction

cd ../apr-model-qa-playbook
cargo run --release --bin apr-qa -- run playbooks/models/granite-3.1-2b-mvp.playbook.yaml \
    --output certifications/granite-3.1-2b-instruct --no-integrity-check --skip-conversion-tests \
    --no-differential --no-trace-payload --timeout 120000

Expected Behavior

Should achieve >=90% pass rate. The 3 failing tests likely share a common root cause.

Labels

Architecture support, Granite, P2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions