Summary
IBM Granite-3.1-2B-Instruct achieves 89.3% pass rate (25/28 tests) during MVP certification. Close to the 90% threshold but 3 specific test matrix cells fail consistently.
Evidence
| Model |
Pass Rate |
Tests |
Source |
| Granite-3.1-2B-Instruct |
89.3% |
25/28 |
apr-model-qa-playbook/certifications/granite-3.1-2b-instruct/evidence.json |
Context
Granite uses a LLaMA-derivative architecture. The 89.3% pass rate suggests most of the stack works but a few specific format/modality combinations have issues. Investigating the 3 failing evidence entries should reveal a targeted fix.
Reproduction
cd ../apr-model-qa-playbook
cargo run --release --bin apr-qa -- run playbooks/models/granite-3.1-2b-mvp.playbook.yaml \
--output certifications/granite-3.1-2b-instruct --no-integrity-check --skip-conversion-tests \
--no-differential --no-trace-payload --timeout 120000
Expected Behavior
Should achieve >=90% pass rate. The 3 failing tests likely share a common root cause.
Labels
Architecture support, Granite, P2
Summary
IBM Granite-3.1-2B-Instruct achieves 89.3% pass rate (25/28 tests) during MVP certification. Close to the 90% threshold but 3 specific test matrix cells fail consistently.
Evidence
apr-model-qa-playbook/certifications/granite-3.1-2b-instruct/evidence.jsonContext
Granite uses a LLaMA-derivative architecture. The 89.3% pass rate suggests most of the stack works but a few specific format/modality combinations have issues. Investigating the 3 failing evidence entries should reveal a targeted fix.
Reproduction
cd ../apr-model-qa-playbook cargo run --release --bin apr-qa -- run playbooks/models/granite-3.1-2b-mvp.playbook.yaml \ --output certifications/granite-3.1-2b-instruct --no-integrity-check --skip-conversion-tests \ --no-differential --no-trace-payload --timeout 120000Expected Behavior
Should achieve >=90% pass rate. The 3 failing tests likely share a common root cause.
Labels
Architecture support, Granite, P2