Commit a367de8
Add --parallel N flag to vera-bench run
Run N problems concurrently via ThreadPoolExecutor. Each worker
is I/O-bound on its LLM HTTP call + subprocess-based check/run,
so the GIL is not a bottleneck.
Use case: slow models like Kimi K2.5 averaged 49s/problem
sequentially across the 60-problem AILANG sweep (~50 min total).
With --parallel 10 the same sweep should drop to ~5 min, which
makes release-time re-evals practical.
Implementation:
- ThreadPoolExecutor with max_workers=parallel
- Per-problem futures collected via as_completed
- threading.Lock around the JSONL append so concurrent writes
don't interleave. Lines are still self-contained (carry
problem_id) so completion-order writes are fine.
- Workers share the same work_dir; per-problem temp files are
uniquified by problem_id (existing behavior).
- Exception per worker is caught and logged; the sweep continues.
Default parallel=1 preserves the existing sequential path with
no behavior change.
Smoke-tested with claude-haiku-4-5 --tier 1 --parallel 4:
10/10 problems, no duplicates, 100%/100% run_correct.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 5ad6475 commit a367de8
2 files changed
Lines changed: 85 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
79 | 90 | | |
80 | 91 | | |
81 | 92 | | |
| |||
86 | 97 | | |
87 | 98 | | |
88 | 99 | | |
| 100 | + | |
89 | 101 | | |
90 | 102 | | |
91 | 103 | | |
| |||
274 | 286 | | |
275 | 287 | | |
276 | 288 | | |
| 289 | + | |
277 | 290 | | |
278 | 291 | | |
279 | 292 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1108 | 1108 | | |
1109 | 1109 | | |
1110 | 1110 | | |
| 1111 | + | |
1111 | 1112 | | |
1112 | 1113 | | |
1113 | 1114 | | |
1114 | 1115 | | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
1115 | 1128 | | |
1116 | 1129 | | |
1117 | 1130 | | |
1118 | 1131 | | |
1119 | 1132 | | |
1120 | | - | |
1121 | | - | |
1122 | | - | |
1123 | | - | |
1124 | | - | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
1125 | 1167 | | |
1126 | 1168 | | |
1127 | 1169 | | |
| |||
1133 | 1175 | | |
1134 | 1176 | | |
1135 | 1177 | | |
1136 | | - | |
1137 | | - | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
1142 | | - | |
1143 | 1178 | | |
1144 | | - | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
1145 | 1204 | | |
1146 | 1205 | | |
1147 | 1206 | | |
| |||
0 commit comments