Skip to content

Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9#71

Merged
aallan merged 3 commits into
mainfrom
slides/v0.0.7-talk-prep
May 22, 2026
Merged

Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9#71
aallan merged 3 commits into
mainfrom
slides/v0.0.7-talk-prep

Conversation

@aallan

@aallan aallan commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

Adds scripts/plot_slide.py — a specialised slide renderer for talk presentation of the v0.0.7 result panels. Three slide types, 16:9 landscape (2880×1620 px), slide-readable typography, four light-theme background choices (paper / white / cream / light-grey).

Driven by an actual talk being written — option 3 from the post-render discussion (commit the script, leave PNGs uncommitted) so the slides are reproducible without pinning ephemeral artefacts to the repo.

What's here

File Change
scripts/plot_slide.py New — the slide renderer
scripts/README.md New section documenting the script, its specialised scope, background choices, and the v0.0.7 lineup pin
.gitignore Extends the assets/*.png chart-variant ignore pattern to cover assets/vera-bench_slide_*.png

Specialised, not general

The v0.0.7 model lineup (Claude Opus 4 / GPT-4.1 / Kimi K2.5 in flagship; Claude Sonnet 4 / GPT-4o / Kimi K2 Turbo in sonnet) is hard-coded in MODELS_V_0_0_7, because plot_results.MODELS has since been updated to reflect the post-K2.6 migration (PR #69). The script reuses palette, typography constants, and extract_data() from plot_results.py so the slide numbers match the README chart by construction.

The README section flags this explicitly so future-me doesn't try to use it for v0.0.10 / v0.0.11 / v0.0.12 results without realising the lineup is pinned to v0.0.7.

Three slide types

  • delta — the "Does Vera beat Python / TypeScript?" horizontal-bar chart (the headline storytelling slide; Vera-wins read as green positive bars)
  • tiers — Flagship and Sonnet tier comparisons side-by-side, mirroring the top row of the documentation chart
  • all-modes — all 6 models × 4 modes (Vera, Vera NL, Python, TypeScript) in a single grouped-bar panel

Background choices

Choice Hex Notes
paper (default) #FAF7F0 Off-white; soft, neutral, doesn't compete with chart colours
white #FFFFFF Pure white; baseline / high contrast
cream #FEEAD1 On-brand (veralang.dev palette); warmer
light-grey #F4F4F2 Neutral, "corporate clean"

Dark mode deliberately not offered — requires cascading text-colour inversion that's out of scope for this talk's design.

No version bump

New tooling, no methodology change. Mirrors the precedent of attribution / tooling-only PRs (#58, #59, #67, #69 — none of which bumped).

Test plan

  • ruff check + ruff format --check clean
  • All three slide types render cleanly on default paper background
  • --background {white,cream,light-grey} exercises each branch successfully
  • Numbers cross-check against the v0.0.7 documentation chart in assets/results-graph.png (24 cells × 3 panels, all match)
  • scripts/README.md section flags the v0.0.7 lineup pin and the talk-ephemera nature of the PNGs

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a slide renderer that generates 16:9 benchmark PNGs for v0.0.7 with three layouts: delta comparisons, tier breakdowns and all-modes grouped analysis; selectable slide types and configurable background/output behaviour.
  • Documentation

    • Added usage docs and examples covering slide types, background themes, output defaults and recommended locations.
  • Chores

    • Updated .gitignore to exclude talk-rendered slide artefact PNGs.

Review Change Stack

Standalone script that renders the v0.0.7 result panels as 16:9 slides
sized and styled for talk presentation. Three slide types:

- delta     — "Does Vera beat Python / TypeScript?" headline chart
- tiers     — Flagship and Sonnet tier comparisons side-by-side
- all-modes — all 6 models × 4 modes in a single grouped-bar panel

Specialised, not general. The v0.0.7 model lineup (K2.5 in flagship,
K2 Turbo in sonnet) is hard-coded because the live plot_results.MODELS
registry now reflects the post-K2.6 migration (PR #69). Reuses palette
+ extract_data() from plot_results.py so slide numbers match the README
chart by construction.

Typography is bumped roughly 3× from the documentation-chart sizes so
the slide reads from the back of a room. Slide canvas is 16×9 in at
dpi=180 (2880×1620 px), with a tunable --background flag offering four
light-theme options (paper / white / cream / light-grey). Dark mode is
deliberately out of scope — would require cascading text-colour
inversion that the current talk's design doesn't need.

Output handling:
- PNGs default to /tmp/ because they're talk-prep ephemera that belong
  in the speaker's deck rather than the repo
- assets/vera-bench_slide_*.png is gitignored for the case where
  someone outputs to assets/ for preview — the canonical artefact is
  the script itself; regeneration is cheap
- Brief section added to scripts/README.md flagging the v0.0.7 pin so
  future-me doesn't try to use this for a different release

Verified:
- ruff check + ruff format --check clean
- All three slides render cleanly on default paper background
- Numbers cross-check against the v0.0.7 documentation chart (each of
  6 models × 4 modes = 24 cells)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a8f1acd-1527-4d27-a638-c6c8a2c02f45

📥 Commits

Reviewing files that changed from the base of the PR and between 35de546 and 80e4c72.

📒 Files selected for processing (1)
  • scripts/plot_slide.py

📝 Walkthrough

Walkthrough

This PR adds scripts/plot_slide.py, a CLI script that renders presentation-ready 16:9 slides (delta, tiers, all-modes) from v0.0.7 benchmark data by temporarily patching the model registry, plus README docs and a .gitignore entry for generated slide PNGs.

Changes

Talk Slide Rendering Script

Layer / File(s) Summary
Configuration and documentation
.gitignore, scripts/README.md
Gitignore entry for assets/vera-bench_slide_*.png and README section documenting plot_slide.py, supported slide types, the hard-coded v0.0.7 model lineup, styling/data reuse from plot_results.py, CLI usage, backgrounds, and output conventions.
Script setup and data loading
scripts/plot_slide.py
Matplotlib Agg backend, imports from scripts.plot_results, MODELS_V_0_0_7, typography and background palette constants, and _load_v0_0_7_data() helper that patches plot_results.MODELS, calls extract_data(), and restores the registry.
Styling and delta renderer
scripts/plot_slide.py
Global rcParams and _style_ax() helper, plus render_delta() producing a 16:9 horizontal delta bar chart with per-bar labels and zero reference line; uses _save() to tint and write PNG.
Tier slide renderer
scripts/plot_slide.py
_draw_tier_panel() builds grouped vertical bars with percentage labels; render_tiers() composes Flagship and Sonnet panels side-by-side and saves via _save().
All-modes slide renderer
scripts/plot_slide.py
render_all_modes() renders grouped bars for Vera, Vera NL, Python, and TypeScript across models with per-bar labels and saves via _save().
Save, renderers registry and CLI
scripts/plot_slide.py
_save() tints figure/axes and writes PNGs; RENDERERS maps type names to renderer functions; main() implements argparse CLI (--type, --version, --results-dir, --output, --background), validates --output usage, loads v0.0.7 data once, and writes the requested slide PNG(s).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • aallan/vera-bench#59: The new scripts/plot_slide.py depends on scripts.plot_results.py data handling and temporarily patches plot_results.MODELS before calling plot_results.extract_data().

Suggested labels

ci

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarises the primary change: adding a new v0.0.7 talk-slide renderer script with 16:9 aspect ratio.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch slides/v0.0.7-talk-prep

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/plot_slide.py`:
- Around line 455-457: The CLI argument for "--version" currently allows
arbitrary strings while the code always patches the model registry to
MODELS_V_0_0_7; fix this by constraining or removing the flag: either remove the
"--version" argument from the ArgumentParser, or change its definition (the
place that sets default="0.0.7") to only accept the single choice "0.0.7" (use
the parser's choices parameter or equivalent) so that callers cannot pass other
versions that would be silently mis-mapped to MODELS_V_0_0_7.
- Line 110: Add explicit return type annotations to the listed function
definitions in scripts/plot_slide.py: annotate _patch_models_for_slide,
_load_v0_0_7_data, _slide_rcparams, _style_ax, _draw_tier_panel, and main with
the appropriate types (e.g. -> None for functions that don’t return a value; use
precise tuple[...] or other concrete typing for functions that return multiple
values or structures such as _load_v0_0_7_data and _draw_tier_panel). Update the
def lines (e.g. def _patch_models_for_slide(...) -> None:) and pick exact return
type signatures that match the actual return values in each function body so
static type checkers reflect the real outputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82681587-d9a1-42ee-87c2-182e516c5daf

📥 Commits

Reviewing files that changed from the base of the PR and between 6915d87 and f8d8620.

📒 Files selected for processing (3)
  • .gitignore
  • scripts/README.md
  • scripts/plot_slide.py

Comment thread scripts/plot_slide.py Outdated
Comment thread scripts/plot_slide.py Outdated
@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.65%. Comparing base (6915d87) to head (80e4c72).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #71   +/-   ##
=======================================
  Coverage   83.65%   83.65%           
=======================================
  Files          10       10           
  Lines        1395     1395           
=======================================
  Hits         1167     1167           
  Misses        228      228           
Flag Coverage Δ
python 83.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Two CodeRabbit findings, both valid against current code:

1. --version accepted any string but the model lineup was always patched
   to MODELS_V_0_0_7. Passing e.g. --version 0.0.9 silently produced a
   chart with v0.0.7-era labels and v0.0.9 numbers — exactly the kind
   of label/data mis-mapping that would corrupt a slide deck without
   anyone noticing. Verified locally: --version 0.0.9 produced a PNG
   pre-fix.

   Fix: argparse choices=["0.0.7"] plus help text explaining why the
   lineup is pinned and what future-extension would require. Now
   --version 0.0.9 errors out loud:

       error: argument --version: invalid choice: '0.0.9'
       (choose from 0.0.7)

2. Six internal helpers lacked return type annotations. Added them:

   - _patch_models_for_slide() -> tuple[ModuleType, list[ModelSpec]]
   - _load_v0_0_7_data() -> tuple[dict[str, dict[str, int]],
                                  dict[str, dict[str, int]]]
   - _slide_rcparams() -> None
   - _style_ax() -> None
   - _draw_tier_panel() -> None
   - main() -> None

   New import: `from types import ModuleType` for the patch helper's
   precise return type. Dict type spellings match the pattern in
   plot_results.py (no type alias introduced; inline tuple types
   stay consistent with the existing extract_data signature).

   Argument-level annotations on _style_ax(ax) and _draw_tier_panel(ax)
   left unannotated — out of scope (CR asked specifically for return
   types) and matplotlib's Axes typing has its own complications worth
   handling separately if/when we type the ax parameters globally.

Verified:
- ruff check + ruff format --check clean
- Default invocation still produces all three slides correctly
- --version 0.0.9 now errors at argparse rather than silently
  producing a mis-labeled chart

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/plot_slide.py`:
- Line 44: The functions _style_ax and _draw_tier_panel currently have untyped
ax parameters; add an explicit Axes type from matplotlib to satisfy the
project's type-hint rule: add an import for Axes (e.g. from matplotlib.axes
import Axes) at the top and annotate the signatures as def _style_ax(ax: Axes)
-> None and def _draw_tier_panel(ax: Axes, ...) -> None (keep other parameter
names/annotations and return types unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a30a5f6-15ab-4250-878f-270dd7889408

📥 Commits

Reviewing files that changed from the base of the PR and between f8d8620 and 35de546.

📒 Files selected for processing (1)
  • scripts/plot_slide.py

Comment thread scripts/plot_slide.py
…panel

CodeRabbit pushed back on my earlier deferral of typing `ax`
parameters, correctly pointing out that CLAUDE.md says "type hints
everywhere" without an exemption for matplotlib types. My earlier
"matplotlib Axes typing has its own complications" framing was
sloppy — `from matplotlib.axes import Axes` is a one-line import,
not actually complicated.

Fix:
- Added `from matplotlib.axes import Axes` import
- `_style_ax(ax: Axes) -> None`
- `_draw_tier_panel(ax: Axes, ...) -> None`

Note for follow-up: plot_results.py has the same untyped-ax pattern
across three functions (_style_ax, plot_tier, plot_all_modes).
CodeRabbit's finding would apply there too, but that's a different
file's surface area and out of scope for this PR.

Verified: ruff clean, default invocation produces all three slides
correctly, no behaviour change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aallan aallan merged commit 237ca81 into main May 22, 2026
10 checks passed
@aallan aallan deleted the slides/v0.0.7-talk-prep branch May 22, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant