Add scripts/plot_slide.py — v0.0.7 talk-slide renderer at 16:9#71
Conversation
Standalone script that renders the v0.0.7 result panels as 16:9 slides sized and styled for talk presentation. Three slide types: - delta — "Does Vera beat Python / TypeScript?" headline chart - tiers — Flagship and Sonnet tier comparisons side-by-side - all-modes — all 6 models × 4 modes in a single grouped-bar panel Specialised, not general. The v0.0.7 model lineup (K2.5 in flagship, K2 Turbo in sonnet) is hard-coded because the live plot_results.MODELS registry now reflects the post-K2.6 migration (PR #69). Reuses palette + extract_data() from plot_results.py so slide numbers match the README chart by construction. Typography is bumped roughly 3× from the documentation-chart sizes so the slide reads from the back of a room. Slide canvas is 16×9 in at dpi=180 (2880×1620 px), with a tunable --background flag offering four light-theme options (paper / white / cream / light-grey). Dark mode is deliberately out of scope — would require cascading text-colour inversion that the current talk's design doesn't need. Output handling: - PNGs default to /tmp/ because they're talk-prep ephemera that belong in the speaker's deck rather than the repo - assets/vera-bench_slide_*.png is gitignored for the case where someone outputs to assets/ for preview — the canonical artefact is the script itself; regeneration is cheap - Brief section added to scripts/README.md flagging the v0.0.7 pin so future-me doesn't try to use this for a different release Verified: - ruff check + ruff format --check clean - All three slides render cleanly on default paper background - Numbers cross-check against the v0.0.7 documentation chart (each of 6 models × 4 modes = 24 cells) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds ChangesTalk Slide Rendering Script
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/plot_slide.py`:
- Around line 455-457: The CLI argument for "--version" currently allows
arbitrary strings while the code always patches the model registry to
MODELS_V_0_0_7; fix this by constraining or removing the flag: either remove the
"--version" argument from the ArgumentParser, or change its definition (the
place that sets default="0.0.7") to only accept the single choice "0.0.7" (use
the parser's choices parameter or equivalent) so that callers cannot pass other
versions that would be silently mis-mapped to MODELS_V_0_0_7.
- Line 110: Add explicit return type annotations to the listed function
definitions in scripts/plot_slide.py: annotate _patch_models_for_slide,
_load_v0_0_7_data, _slide_rcparams, _style_ax, _draw_tier_panel, and main with
the appropriate types (e.g. -> None for functions that don’t return a value; use
precise tuple[...] or other concrete typing for functions that return multiple
values or structures such as _load_v0_0_7_data and _draw_tier_panel). Update the
def lines (e.g. def _patch_models_for_slide(...) -> None:) and pick exact return
type signatures that match the actual return values in each function body so
static type checkers reflect the real outputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 82681587-d9a1-42ee-87c2-182e516c5daf
📒 Files selected for processing (3)
.gitignorescripts/README.mdscripts/plot_slide.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #71 +/- ##
=======================================
Coverage 83.65% 83.65%
=======================================
Files 10 10
Lines 1395 1395
=======================================
Hits 1167 1167
Misses 228 228
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Two CodeRabbit findings, both valid against current code:
1. --version accepted any string but the model lineup was always patched
to MODELS_V_0_0_7. Passing e.g. --version 0.0.9 silently produced a
chart with v0.0.7-era labels and v0.0.9 numbers — exactly the kind
of label/data mis-mapping that would corrupt a slide deck without
anyone noticing. Verified locally: --version 0.0.9 produced a PNG
pre-fix.
Fix: argparse choices=["0.0.7"] plus help text explaining why the
lineup is pinned and what future-extension would require. Now
--version 0.0.9 errors out loud:
error: argument --version: invalid choice: '0.0.9'
(choose from 0.0.7)
2. Six internal helpers lacked return type annotations. Added them:
- _patch_models_for_slide() -> tuple[ModuleType, list[ModelSpec]]
- _load_v0_0_7_data() -> tuple[dict[str, dict[str, int]],
dict[str, dict[str, int]]]
- _slide_rcparams() -> None
- _style_ax() -> None
- _draw_tier_panel() -> None
- main() -> None
New import: `from types import ModuleType` for the patch helper's
precise return type. Dict type spellings match the pattern in
plot_results.py (no type alias introduced; inline tuple types
stay consistent with the existing extract_data signature).
Argument-level annotations on _style_ax(ax) and _draw_tier_panel(ax)
left unannotated — out of scope (CR asked specifically for return
types) and matplotlib's Axes typing has its own complications worth
handling separately if/when we type the ax parameters globally.
Verified:
- ruff check + ruff format --check clean
- Default invocation still produces all three slides correctly
- --version 0.0.9 now errors at argparse rather than silently
producing a mis-labeled chart
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/plot_slide.py`:
- Line 44: The functions _style_ax and _draw_tier_panel currently have untyped
ax parameters; add an explicit Axes type from matplotlib to satisfy the
project's type-hint rule: add an import for Axes (e.g. from matplotlib.axes
import Axes) at the top and annotate the signatures as def _style_ax(ax: Axes)
-> None and def _draw_tier_panel(ax: Axes, ...) -> None (keep other parameter
names/annotations and return types unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a30a5f6-15ab-4250-878f-270dd7889408
📒 Files selected for processing (1)
scripts/plot_slide.py
…panel CodeRabbit pushed back on my earlier deferral of typing `ax` parameters, correctly pointing out that CLAUDE.md says "type hints everywhere" without an exemption for matplotlib types. My earlier "matplotlib Axes typing has its own complications" framing was sloppy — `from matplotlib.axes import Axes` is a one-line import, not actually complicated. Fix: - Added `from matplotlib.axes import Axes` import - `_style_ax(ax: Axes) -> None` - `_draw_tier_panel(ax: Axes, ...) -> None` Note for follow-up: plot_results.py has the same untyped-ax pattern across three functions (_style_ax, plot_tier, plot_all_modes). CodeRabbit's finding would apply there too, but that's a different file's surface area and out of scope for this PR. Verified: ruff clean, default invocation produces all three slides correctly, no behaviour change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Adds
scripts/plot_slide.py— a specialised slide renderer for talk presentation of the v0.0.7 result panels. Three slide types, 16:9 landscape (2880×1620 px), slide-readable typography, four light-theme background choices (paper / white / cream / light-grey).Driven by an actual talk being written — option 3 from the post-render discussion (commit the script, leave PNGs uncommitted) so the slides are reproducible without pinning ephemeral artefacts to the repo.
What's here
scripts/plot_slide.pyscripts/README.md.gitignoreassets/*.pngchart-variant ignore pattern to coverassets/vera-bench_slide_*.pngSpecialised, not general
The v0.0.7 model lineup (Claude Opus 4 / GPT-4.1 / Kimi K2.5 in flagship; Claude Sonnet 4 / GPT-4o / Kimi K2 Turbo in sonnet) is hard-coded in
MODELS_V_0_0_7, becauseplot_results.MODELShas since been updated to reflect the post-K2.6 migration (PR #69). The script reuses palette, typography constants, andextract_data()fromplot_results.pyso the slide numbers match the README chart by construction.The README section flags this explicitly so future-me doesn't try to use it for v0.0.10 / v0.0.11 / v0.0.12 results without realising the lineup is pinned to v0.0.7.
Three slide types
delta— the "Does Vera beat Python / TypeScript?" horizontal-bar chart (the headline storytelling slide; Vera-wins read as green positive bars)tiers— Flagship and Sonnet tier comparisons side-by-side, mirroring the top row of the documentation chartall-modes— all 6 models × 4 modes (Vera, Vera NL, Python, TypeScript) in a single grouped-bar panelBackground choices
paper(default)#FAF7F0white#FFFFFFcream#FEEAD1light-grey#F4F4F2Dark mode deliberately not offered — requires cascading text-colour inversion that's out of scope for this talk's design.
No version bump
New tooling, no methodology change. Mirrors the precedent of attribution / tooling-only PRs (#58, #59, #67, #69 — none of which bumped).
Test plan
ruff check+ruff format --checkclean--background {white,cream,light-grey}exercises each branch successfullyassets/results-graph.png(24 cells × 3 panels, all match)scripts/README.mdsection flags the v0.0.7 lineup pin and the talk-ephemera nature of the PNGs🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores