Context
We currently support Vera, Python, and TypeScript for both baselines and LLM generation. Go would be a valuable fourth comparison language — it sits between Python (dynamic types, no contracts) and Vera (static types, mandatory contracts) in the type safety spectrum.
Why Go
- Static types, no contracts — fills a gap in the comparison. If Go's type system catches bugs Python misses, but Vera's contracts catch bugs Go's types miss, that's a strong argument for the contract system.
- Strict compiler —
go build rejects unused variables, unused imports, type mismatches. Closer to Vera's strictness than Python/TypeScript.
- Error handling patterns —
if err != nil is a natural parallel to Vera's Exn effect problems.
- Well-represented in training data — LLMs write good Go.
- Single binary runtime —
go run file.go with no config, similar to npx tsx.
Implementation
Following the TypeScript pattern:
Baseline solutions (50 files)
solutions/go/VB_T1_001_absolute_value.go etc.
- Go uses
camelCase for exported functions (like TypeScript)
- Each file is a standalone
package main with the function
Code changes
prompts.py: build_go_prompt() — "You are an expert Go programmer"
runner.py: _evaluate_go_code() — write .go file, build test wrapper, run via go run
baseline_runner.py: run_go_baseline() — execute canonical Go solutions
cli.py: add "go" to --language choices
Test wrapper approach
Go doesn't have ES module imports like TypeScript. The wrapper would need to either:
- Write the generated function + test code into a single
main.go file, or
- Use Go's
package system with the generated code as a separate file in the same package
Single-file approach is simplest (same package, concatenate generated code with test harness).
CI
- Add
actions/setup-go@v5 to the test job (if Go baseline tests are added)
Naming convention
- Go exported functions:
AbsoluteValue (PascalCase), Gcd, MaxOfThree
- Need
_snake_to_pascal() converter (or extend _snake_to_camel with a capitalize-first option)
Expected results
Go should land between Python and TypeScript — strong type system catches some bugs at compile time, but no contract system means logic bugs pass silently (like Python/TypeScript). The comparison would be:
| Language |
Type system |
Contracts |
Expected run_correct |
| Python |
Dynamic |
None |
~92% |
| Go |
Static |
None |
~90%? |
| TypeScript |
Static |
None |
~79% |
| Vera |
Static |
Mandatory |
~83% |
Context
We currently support Vera, Python, and TypeScript for both baselines and LLM generation. Go would be a valuable fourth comparison language — it sits between Python (dynamic types, no contracts) and Vera (static types, mandatory contracts) in the type safety spectrum.
Why Go
go buildrejects unused variables, unused imports, type mismatches. Closer to Vera's strictness than Python/TypeScript.if err != nilis a natural parallel to Vera'sExneffect problems.go run file.gowith no config, similar tonpx tsx.Implementation
Following the TypeScript pattern:
Baseline solutions (50 files)
solutions/go/VB_T1_001_absolute_value.goetc.camelCasefor exported functions (like TypeScript)package mainwith the functionCode changes
prompts.py:build_go_prompt()— "You are an expert Go programmer"runner.py:_evaluate_go_code()— write.gofile, build test wrapper, run viago runbaseline_runner.py:run_go_baseline()— execute canonical Go solutionscli.py: add "go" to--languagechoicesTest wrapper approach
Go doesn't have ES module imports like TypeScript. The wrapper would need to either:
main.gofile, orpackagesystem with the generated code as a separate file in the same packageSingle-file approach is simplest (same package, concatenate generated code with test harness).
CI
actions/setup-go@v5to the test job (if Go baseline tests are added)Naming convention
AbsoluteValue(PascalCase),Gcd,MaxOfThree_snake_to_pascal()converter (or extend_snake_to_camelwith a capitalize-first option)Expected results
Go should land between Python and TypeScript — strong type system catches some bugs at compile time, but no contract system means logic bugs pass silently (like Python/TypeScript). The comparison would be: