Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9 by aallan · Pull Request #71 · aallan/vera-bench

aallan · 2026-05-22T10:23:39Z

Summary

Adds scripts/plot_slide.py — a specialised slide renderer for talk presentation of the v0.0.7 result panels. Three slide types, 16:9 landscape (2880×1620 px), slide-readable typography, four light-theme background choices (paper / white / cream / light-grey).

Driven by an actual talk being written — option 3 from the post-render discussion (commit the script, leave PNGs uncommitted) so the slides are reproducible without pinning ephemeral artefacts to the repo.

What's here

File	Change
`scripts/plot_slide.py`	New — the slide renderer
`scripts/README.md`	New section documenting the script, its specialised scope, background choices, and the v0.0.7 lineup pin
`.gitignore`	Extends the `assets/.png` chart-variant ignore pattern to cover `assets/vera-bench_slide_.png`

Specialised, not general

The v0.0.7 model lineup (Claude Opus 4 / GPT-4.1 / Kimi K2.5 in flagship; Claude Sonnet 4 / GPT-4o / Kimi K2 Turbo in sonnet) is hard-coded in MODELS_V_0_0_7, because plot_results.MODELS has since been updated to reflect the post-K2.6 migration (PR #69). The script reuses palette, typography constants, and extract_data() from plot_results.py so the slide numbers match the README chart by construction.

The README section flags this explicitly so future-me doesn't try to use it for v0.0.10 / v0.0.11 / v0.0.12 results without realising the lineup is pinned to v0.0.7.

Three slide types

delta — the "Does Vera beat Python / TypeScript?" horizontal-bar chart (the headline storytelling slide; Vera-wins read as green positive bars)
tiers — Flagship and Sonnet tier comparisons side-by-side, mirroring the top row of the documentation chart
all-modes — all 6 models × 4 modes (Vera, Vera NL, Python, TypeScript) in a single grouped-bar panel

Background choices

Choice	Hex	Notes
`paper` (default)	`#FAF7F0`	Off-white; soft, neutral, doesn't compete with chart colours
`white`	`#FFFFFF`	Pure white; baseline / high contrast
`cream`	`#FEEAD1`	On-brand (veralang.dev palette); warmer
`light-grey`	`#F4F4F2`	Neutral, "corporate clean"

Dark mode deliberately not offered — requires cascading text-colour inversion that's out of scope for this talk's design.

No version bump

New tooling, no methodology change. Mirrors the precedent of attribution / tooling-only PRs (#58, #59, #67, #69 — none of which bumped).

Test plan

ruff check + ruff format --check clean
All three slide types render cleanly on default paper background
--background {white,cream,light-grey} exercises each branch successfully
Numbers cross-check against the v0.0.7 documentation chart in assets/results-graph.png (24 cells × 3 panels, all match)
scripts/README.md section flags the v0.0.7 lineup pin and the talk-ephemera nature of the PNGs

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a slide renderer that generates 16:9 benchmark PNGs for v0.0.7 with three layouts: delta comparisons, tier breakdowns and all-modes grouped analysis; selectable slide types and configurable background/output behaviour.
Documentation
- Added usage docs and examples covering slide types, background themes, output defaults and recommended locations.
Chores
- Updated .gitignore to exclude talk-rendered slide artefact PNGs.

Standalone script that renders the v0.0.7 result panels as 16:9 slides sized and styled for talk presentation. Three slide types: - delta — "Does Vera beat Python / TypeScript?" headline chart - tiers — Flagship and Sonnet tier comparisons side-by-side - all-modes — all 6 models × 4 modes in a single grouped-bar panel Specialised, not general. The v0.0.7 model lineup (K2.5 in flagship, K2 Turbo in sonnet) is hard-coded because the live plot_results.MODELS registry now reflects the post-K2.6 migration (PR #69). Reuses palette + extract_data() from plot_results.py so slide numbers match the README chart by construction. Typography is bumped roughly 3× from the documentation-chart sizes so the slide reads from the back of a room. Slide canvas is 16×9 in at dpi=180 (2880×1620 px), with a tunable --background flag offering four light-theme options (paper / white / cream / light-grey). Dark mode is deliberately out of scope — would require cascading text-colour inversion that the current talk's design doesn't need. Output handling: - PNGs default to /tmp/ because they're talk-prep ephemera that belong in the speaker's deck rather than the repo - assets/vera-bench_slide_*.png is gitignored for the case where someone outputs to assets/ for preview — the canonical artefact is the script itself; regeneration is cheap - Brief section added to scripts/README.md flagging the v0.0.7 pin so future-me doesn't try to use this for a different release Verified: - ruff check + ruff format --check clean - All three slides render cleanly on default paper background - Numbers cross-check against the v0.0.7 documentation chart (each of 6 models × 4 modes = 24 cells) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-22T10:23:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a8f1acd-1527-4d27-a638-c6c8a2c02f45

📥 Commits

Reviewing files that changed from the base of the PR and between 35de546 and 80e4c72.

📒 Files selected for processing (1)

scripts/plot_slide.py

📝 Walkthrough

Walkthrough

This PR adds scripts/plot_slide.py, a CLI script that renders presentation-ready 16:9 slides (delta, tiers, all-modes) from v0.0.7 benchmark data by temporarily patching the model registry, plus README docs and a .gitignore entry for generated slide PNGs.

Changes

Talk Slide Rendering Script

Layer / File(s)	Summary
Configuration and documentation `.gitignore`, `scripts/README.md`	Gitignore entry for `assets/vera-bench_slide_*.png` and README section documenting `plot_slide.py`, supported slide types, the hard-coded v0.0.7 model lineup, styling/data reuse from `plot_results.py`, CLI usage, backgrounds, and output conventions.
Script setup and data loading `scripts/plot_slide.py`	Matplotlib Agg backend, imports from `scripts.plot_results`, `MODELS_V_0_0_7`, typography and background palette constants, and `_load_v0_0_7_data()` helper that patches `plot_results.MODELS`, calls `extract_data()`, and restores the registry.
Styling and delta renderer `scripts/plot_slide.py`	Global rcParams and `_style_ax()` helper, plus `render_delta()` producing a 16:9 horizontal delta bar chart with per-bar labels and zero reference line; uses `_save()` to tint and write PNG.
Tier slide renderer `scripts/plot_slide.py`	`_draw_tier_panel()` builds grouped vertical bars with percentage labels; `render_tiers()` composes Flagship and Sonnet panels side-by-side and saves via `_save()`.
All-modes slide renderer `scripts/plot_slide.py`	`render_all_modes()` renders grouped bars for Vera, Vera NL, Python, and TypeScript across models with per-bar labels and saves via `_save()`.
Save, renderers registry and CLI `scripts/plot_slide.py`	`_save()` tints figure/axes and writes PNGs; `RENDERERS` maps type names to renderer functions; `main()` implements argparse CLI (`--type`, `--version`, `--results-dir`, `--output`, `--background`), validates `--output` usage, loads v0.0.7 data once, and writes the requested slide PNG(s).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

aallan/vera-bench#59: The new scripts/plot_slide.py depends on scripts.plot_results.py data handling and temporarily patches plot_results.MODELS before calling plot_results.extract_data().

Suggested labels

ci

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarises the primary change: adding a new v0.0.7 talk-slide renderer script with 16:9 aspect ratio.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch slides/v0.0.7-talk-prep

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/plot_slide.py`:
- Around line 455-457: The CLI argument for "--version" currently allows
arbitrary strings while the code always patches the model registry to
MODELS_V_0_0_7; fix this by constraining or removing the flag: either remove the
"--version" argument from the ArgumentParser, or change its definition (the
place that sets default="0.0.7") to only accept the single choice "0.0.7" (use
the parser's choices parameter or equivalent) so that callers cannot pass other
versions that would be silently mis-mapped to MODELS_V_0_0_7.
- Line 110: Add explicit return type annotations to the listed function
definitions in scripts/plot_slide.py: annotate _patch_models_for_slide,
_load_v0_0_7_data, _slide_rcparams, _style_ax, _draw_tier_panel, and main with
the appropriate types (e.g. -> None for functions that don’t return a value; use
precise tuple[...] or other concrete typing for functions that return multiple
values or structures such as _load_v0_0_7_data and _draw_tier_panel). Update the
def lines (e.g. def _patch_models_for_slide(...) -> None:) and pick exact return
type signatures that match the actual return values in each function body so
static type checkers reflect the real outputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82681587-d9a1-42ee-87c2-182e516c5daf

📥 Commits

Reviewing files that changed from the base of the PR and between 6915d87 and f8d8620.

📒 Files selected for processing (3)

.gitignore
scripts/README.md
scripts/plot_slide.py

codecov · 2026-05-22T10:29:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.65%. Comparing base (6915d87) to head (80e4c72).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #71   +/-   ##
=======================================
  Coverage   83.65%   83.65%           
=======================================
  Files          10       10           
  Lines        1395     1395           
=======================================
  Hits         1167     1167           
  Misses        228      228

Flag	Coverage Δ
python	`83.65% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Two CodeRabbit findings, both valid against current code: 1. --version accepted any string but the model lineup was always patched to MODELS_V_0_0_7. Passing e.g. --version 0.0.9 silently produced a chart with v0.0.7-era labels and v0.0.9 numbers — exactly the kind of label/data mis-mapping that would corrupt a slide deck without anyone noticing. Verified locally: --version 0.0.9 produced a PNG pre-fix. Fix: argparse choices=["0.0.7"] plus help text explaining why the lineup is pinned and what future-extension would require. Now --version 0.0.9 errors out loud: error: argument --version: invalid choice: '0.0.9' (choose from 0.0.7) 2. Six internal helpers lacked return type annotations. Added them: - _patch_models_for_slide() -> tuple[ModuleType, list[ModelSpec]] - _load_v0_0_7_data() -> tuple[dict[str, dict[str, int]], dict[str, dict[str, int]]] - _slide_rcparams() -> None - _style_ax() -> None - _draw_tier_panel() -> None - main() -> None New import: `from types import ModuleType` for the patch helper's precise return type. Dict type spellings match the pattern in plot_results.py (no type alias introduced; inline tuple types stay consistent with the existing extract_data signature). Argument-level annotations on _style_ax(ax) and _draw_tier_panel(ax) left unannotated — out of scope (CR asked specifically for return types) and matplotlib's Axes typing has its own complications worth handling separately if/when we type the ax parameters globally. Verified: - ruff check + ruff format --check clean - Default invocation still produces all three slides correctly - --version 0.0.9 now errors at argparse rather than silently producing a mis-labeled chart Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/plot_slide.py`:
- Line 44: The functions _style_ax and _draw_tier_panel currently have untyped
ax parameters; add an explicit Axes type from matplotlib to satisfy the
project's type-hint rule: add an import for Axes (e.g. from matplotlib.axes
import Axes) at the top and annotate the signatures as def _style_ax(ax: Axes)
-> None and def _draw_tier_panel(ax: Axes, ...) -> None (keep other parameter
names/annotations and return types unchanged).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a30a5f6-15ab-4250-878f-270dd7889408

📥 Commits

Reviewing files that changed from the base of the PR and between f8d8620 and 35de546.

📒 Files selected for processing (1)

scripts/plot_slide.py

…panel CodeRabbit pushed back on my earlier deferral of typing `ax` parameters, correctly pointing out that CLAUDE.md says "type hints everywhere" without an exemption for matplotlib types. My earlier "matplotlib Axes typing has its own complications" framing was sloppy — `from matplotlib.axes import Axes` is a one-line import, not actually complicated. Fix: - Added `from matplotlib.axes import Axes` import - `_style_ax(ax: Axes) -> None` - `_draw_tier_panel(ax: Axes, ...) -> None` Note for follow-up: plot_results.py has the same untyped-ax pattern across three functions (_style_ax, plot_tier, plot_all_modes). CodeRabbit's finding would apply there too, but that's a different file's surface area and out of scope for this PR. Verified: ruff clean, default invocation produces all three slides correctly, no behaviour change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread scripts/plot_slide.py Outdated

Comment thread scripts/plot_slide.py Outdated

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread scripts/plot_slide.py

aallan merged commit 237ca81 into main May 22, 2026
10 checks passed

aallan deleted the slides/v0.0.7-talk-prep branch May 22, 2026 10:47

aallan mentioned this pull request May 22, 2026

Add AILANG as a baseline target language #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9#71

Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9#71
aallan merged 3 commits into
mainfrom
slides/v0.0.7-talk-prep

aallan commented May 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aallan commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's here

Specialised, not general

Three slide types

Background choices

No version bump

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

codecov Bot commented May 22, 2026 •

edited

Loading