Add <Inference> algebraic effect — LLM calls as typed, verified effects (v0.0.101)#374
Conversation
…ts (v0.0.101) Implements Inference.complete(String → Result<String, String>) as a built-in host-backed effect following the Http pattern. Provider auto-detected from VERA_ANTHROPIC_API_KEY, VERA_OPENAI_API_KEY, or VERA_MOONSHOT_API_KEY; override with VERA_INFERENCE_PROVIDER and VERA_INFERENCE_MODEL. Browser runtime returns a rich Err explaining why API keys cannot be safely embedded in client-side JS. Type system: registers Inference effect in environment.py with complete op. Codegen: compilability scan, _inference_ops_used tracking through core/functions/ modules/assembly, WASM import emission, qualified call translation, type inference. Host runtime: _call_inference_provider() dispatches to Anthropic/OpenAI/Moonshot via urllib.request (zero new dependencies). host_inference_complete registered with wasmtime linker when inference_ops_used is non-empty. Tests: TestInferenceChecker (6), TestInferenceCollection (6), conformance test ch09_inference.vera (level: check), examples/inference.vera (sentiment classify). Docs: spec §9.5.5 overhauled, SKILL.md Inference section, AGENTS.md, README.md feature list + roadmap, vera/README.md, docs/index.html LLM Integration section + features grid entry + browser note, CHANGELOG, ROADMAP #61 struck through. Version: 0.0.100 → 0.0.101. Conformance: 63 → 64. Examples: 29 → 30. Limitations filed: #370 (max_tokens), #371 (embed), #372 (user handlers), #373 (float array host-alloc infrastructure). Removes CodeRabbit badge. Closes #61 Co-Authored-By: Claude <noreply@anthropic.invalid>
📝 WalkthroughWalkthroughThis PR implements the Changes
Sequence Diagram(s)sequenceDiagram
participant Compiler as Compiler<br/>(vera.codegen)
participant WasmModule as WASM Module<br/>(compiled)
participant HostRuntime as Host Runtime<br/>(vera.codegen.api)
participant Provider as API Provider<br/>(Anthropic/OpenAI/<br/>Moonshot)
participant Heap as WASM Heap
Compiler->>Compiler: Register Inference effect<br/>in environment
Compiler->>WasmModule: Emit import<br/>inference_complete(ptr, len)->i32
Note over WasmModule: Runtime execution
WasmModule->>Heap: Allocate prompt string
WasmModule->>HostRuntime: call $vera.inference_complete<br/>(prompt_ptr, prompt_len)
HostRuntime->>HostRuntime: Read prompt from<br/>WASM memory
HostRuntime->>HostRuntime: Detect provider from<br/>env vars (API_KEY)
HostRuntime->>Provider: POST /complete<br/>with prompt
alt Provider Success
Provider-->>HostRuntime: completion text
HostRuntime->>Heap: Allocate Ok(completion)
HostRuntime-->>WasmModule: i32 Result pointer
else Provider Error
Provider-->>HostRuntime: exception
HostRuntime->>Heap: Allocate Err(message)
HostRuntime-->>WasmModule: i32 Result pointer
else No API Key
HostRuntime->>Heap: Allocate Err(no provider)
HostRuntime-->>WasmModule: i32 Result pointer
end
WasmModule->>WasmModule: match Result<String, String><br/>Ok/Err branches
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The implementation spans multiple coordinated compiler/codegen modules (environment registration, type checking, WASM code generation, import declaration, host-import plumbing). Dense logic in Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #374 +/- ##
==========================================
- Coverage 90.32% 90.30% -0.03%
==========================================
Files 49 49
Lines 18999 19100 +101
Branches 219 220 +1
==========================================
+ Hits 17161 17248 +87
- Misses 1834 1848 +14
Partials 4 4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…age tests - Fix check_spec_examples.py allowlist: nan (982→992) and url_decode (1388→1398) shifted 10 lines when the Inference §9.5.5 section was added - Add TestInferenceProviderDispatch: unit tests for all three provider branches (anthropic/openai/moonshot) in _call_inference_provider, plus custom model passthrough and unknown provider error — covers lines 153–210 of api.py - Add openai/moonshot auto-detection and explicit VERA_INFERENCE_PROVIDER override tests to TestInferenceCollection — covers lines 2025/2027 of api.py - Update TESTING.md counts: 3087→3095 tests, test_codegen.py 830→838 tests Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
vera/wasm/calls.py (1)
365-374:⚠️ Potential issue | 🟠 MajorUse
self.needs_alloc, not_needs_alloc, in qualified host-call paths.Line 373 sets
self._needs_alloc = True, which is inconsistent with the canonical allocator flag and can prevent allocator wiring from being marked whenInference.*is used.Proposed fix
if call.qualifier == "Http": wasm_name = f"http_{call.name}" self._http_ops_used.add(wasm_name) - self._needs_alloc = True + self.needs_alloc = True instructions.append(f"call $vera.{wasm_name}") elif call.qualifier == "Inference": wasm_name = f"inference_{call.name}" self._inference_ops_used.add(wasm_name) - self._needs_alloc = True + self.needs_alloc = True instructions.append(f"call $vera.{wasm_name}")Based on learnings: In the allan/vera Python implementation, the canonical “allocator-needed” flag is
self.needs_alloc(no underscore), including for host-import call sites invera/wasm.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vera/wasm/calls.py` around lines 365 - 374, In the host-call handling branches for call.qualifier == "Http" and "Inference" (the code that builds wasm_name like f"http_{call.name}" and f"inference_{call.name}" and appends instructions with call $vera.{wasm_name}), replace assignments to self._needs_alloc = True with self.needs_alloc = True so the canonical allocator flag is set; make this change in both the Http and Inference branches and keep the rest of the logic (adding to self._http_ops_used / self._inference_ops_used and appending the call instruction) unchanged.vera/codegen/compilability.py (1)
43-52:⚠️ Potential issue | 🟡 MinorUpdate the unsupported-effect rationale.
Lines 50-51 still say only
pure,IO,Http,State<T>,Exn<E>, andAsyncare compilable. After this change thatE603rationale is now stale for users; please includeInferencethere as well.✏️ Proposed fix
- rationale="Only pure, IO, Http, State<T>, " - "Exn<E>, and Async effects are compilable.", + rationale="Only pure, IO, Http, Inference, " + "State<T>, Exn<E>, and Async effects are compilable.",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vera/codegen/compilability.py` around lines 43 - 52, The rationale message passed to self._warning for error_code "E603" is stale: it lists compilable effects but omits "Inference" which is now supported; update the rationale string in the same call (the branch handling effects where eff.name != "Inference" and not handled) to include "Inference" alongside "pure, IO, Http, State<T>, Exn<E>, and Async" so the message accurately reflects current compilable effects (refer to the self._warning call that uses decl, the f"Function '{decl.name}' uses unsupported effect '{eff.name}' — skipped." message, and error_code="E603").vera/codegen/functions.py (1)
167-187:⚠️ Potential issue | 🔴 CriticalSet
self.needs_alloc(notself._needs_alloc) for Inference and Http qcalls.Lines 368 and 373 in vera/wasm/calls.py incorrectly set
self._needs_alloc = Truefor Http and Inference qualifiers. The canonical allocator flag, as defined at vera/wasm/context.py:129 and propagated by functions.py:168, isself.needs_alloc(no underscore). All other call paths (arrays, strings, regex, decimals, map, set, json, html) correctly useself.needs_alloc = True.Change:
- vera/wasm/calls.py:368:
self._needs_alloc = True→self.needs_alloc = True- vera/wasm/calls.py:373:
self._needs_alloc = True→self.needs_alloc = TrueThe export is currently generated as a side effect of ops_used propagation, but this violates the authoritative pattern and is fragile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vera/codegen/functions.py` around lines 167 - 187, Two qcall handlers (the Http and Inference qualifier cases) incorrectly set the private flag self._needs_alloc instead of the canonical allocator flag self.needs_alloc; update the assignments in the Http and Inference qcall handling code to set self.needs_alloc = True (replacing self._needs_alloc = True) so they match other paths (arrays, strings, regex, decimals, map, set, json, html) and align with the WasmContext propagation used by functions.py and context.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 18: Update the CHANGELOG bullet that currently reads "No token limits or
temperature controls — uses provider defaults" to accurately reflect that
Anthropic's max_tokens is hardcoded in the codebase; specifically replace or
append that line to mention "Anthropic: max_tokens is hardcoded" (or similar
wording) so users are not misled that all providers use defaults, referencing
the existing bullet text and the Anthropic max_tokens configuration.
In `@README.md`:
- Line 738: Remove the roadmap-status sentence starting with "The features on
the roadmap — \<Http\> ... All four pieces are now complete." from README.md and
instead add or update that status entry in ROADMAP.md (preserving the content
and any strike-throughs/issue links); then replace the removed sentence in
README.md with a short link/reference to ROADMAP.md (e.g., "See ROADMAP.md for
current feature status") so README only links to the canonical roadmap and
ROADMAP.md holds the authoritative status text.
- Line 66: The README line claiming "mockable" for LLM inference is too broad;
update the phrase referencing Inference.complete to qualify that mocking is only
supported on the host/test side (e.g., "host-side test mocking") and explicitly
note that handle[Inference] is still unsupported until issue `#372`, or remove the
"mockable" claim entirely until `#372` lands; specifically edit the sentence
mentioning Inference.complete to either append a parenthetical like "(host-side
test mocking only; handle[Inference] unsupported until `#372`)" or replace
"mockable" with a qualified phrase as described.
In `@ROADMAP.md`:
- Line 23: Update the v0.0.100 summary to remove or correct features that
actually belong to v0.0.101: specifically, detach the "<Inference>" effect and
the "64-program conformance suite" from the v0.0.100 sentence and either move
them to a v0.0.101 entry or rephrase the v0.0.100 line to exclude those items;
edit the text referencing "v0.0.100" so it only lists features that truly land
in that release and add a separate v0.0.101 bullet containing "<Inference>" and
the 64-program conformance count if needed.
In `@SKILL.md`:
- Around line 1089-1099: safe_classify calls classify(`@String.0`) without
ensuring classify's precondition string_length(`@String.0`) > 0; fix by either
strengthening safe_classify's requires clause to require
string_length(`@String.0`) > 0 or by guarding the call (check
string_length(`@String.0`) > 0 and only call classify when true, returning
"unknown" otherwise); reference the functions safe_classify and classify and the
predicate string_length(`@String.0`) > 0 when making the change.
In `@spec/09-standard-library.md`:
- Around line 525-565: Examples in the spec still treat Inference.complete as
returning a plain String but its signature is Result<String, String>; update
every example (notably the classify snippet and any lines that use "let `@String`
= Inference.complete(...)" or similar) to handle the Result type by binding the
result (e.g., let `@Result` = Inference.complete(...)) and then pattern-matching
or unwrapping (Ok(completion) / Err(message)) before using the completion;
ensure functions like classify either propagate the Result (return
Result<String,String>) or explicitly match and return the inner String on Ok or
propagate/convert Err, and apply the same change to all downstream examples that
call Inference.complete so they are valid Vera code against the new API
contract.
In `@tests/test_codegen.py`:
- Around line 9615-9623: The test currently calls os.environ.clear() then
os.environ.update(orig) in the finally block (around the execute(result) call
and the orig snapshot), which wipes the entire process environment and can break
parallel tests; instead, restore only the keys you changed: compute the set of
keys added during the test (set(os.environ) - set(orig)) and delete those, and
for keys present in orig restore their original values via
os.environ.update(orig) or use a test helper/monkeypatch to set only VER
A_ANTHROPIC_API_KEY; replace the os.environ.clear()/update(orig) sequence with
targeted removal/restoration of modified keys (reference the orig variable, the
os.environ.clear()/update(orig) calls, and the execute(result) test block to
locate where to apply the change).
- Around line 9566-9660: Add unit tests in TestInferenceCollection to exercise
provider/model selection and overrides: create tests that (1) simulate OpenAI
and Moonshot auto-detection by patching environment and the internal
_call_inference_provider to verify the chosen provider path (e.g.,
test_inference_auto_detect_openai, test_inference_auto_detect_moonshot), (2)
assert that setting VERA_INFERENCE_PROVIDER forces the provider used, and (3)
assert that VERA_INFERENCE_MODEL overrides the model selection; use the existing
patterns in test_inference_complete_mocked_success/failure for patching
"vera.codegen.api._call_inference_provider", manipulating os.environ, compiling
_CLASSIFY_SOURCE via _compile_ok and executing via execute, and restore
os.environ in finally blocks so tests clean up after themselves.
In `@vera/codegen/api.py`:
- Around line 170-172: The HTTP calls use _urlreq.urlopen without a timeout and
should be bounded: add a finite default timeout constant (e.g.,
DEFAULT_PROVIDER_TIMEOUT) and pass it into _urlreq.urlopen(req,
timeout=DEFAULT_PROVIDER_TIMEOUT) for the POST call that builds req via
_urlreq.Request; wrap the call in the same error handling used elsewhere so
timeout/socket/URLError exceptions are caught and returned as Err with an
expiry/timeout marker (use the existing Err type/symbol), and apply the
identical change to the other two provider branches that also call
_urlreq.urlopen (the occurrences around the other Request/urlopen blocks).
- Around line 2016-2041: host_inference_complete currently reads _os.environ
directly (VERA_INFERENCE_PROVIDER, VERA_ANTHROPIC_API_KEY, VERA_OPENAI_API_KEY,
VERA_MOONSHOT_API_KEY, VERA_INFERENCE_MODEL) which ignores the
execute(env_vars=...) sandbox; change it to resolve env via the same mechanism
as host_get_env or the env dict passed through execute, e.g., replace
_os.environ.get(...) calls with the host_get_env(key, default) helper or lookup
in the execute-provided env map and then use those values when selecting
provider and calling _call_inference_provider so tests and guest IO see the same
environment.
In `@vera/environment.py`:
- Around line 418-429: The Inference effect registration only defines the
"complete" operation but is missing the "embed" operation, so update the
effects["Inference"] EffectInfo to include an OpInfo entry for "embed" alongside
"complete"; add an "embed" key to the operations mapping (similar to how
"complete" is created with OpInfo) and supply the correct parameter and return
types per Issue `#61` (match the expected type signature used by the standard
library), ensuring the operation name "embed" and the EffectInfo/OpInfo usage
remain consistent with the existing patterns in effects["Inference"],
EffectInfo, and OpInfo.
---
Outside diff comments:
In `@vera/codegen/compilability.py`:
- Around line 43-52: The rationale message passed to self._warning for
error_code "E603" is stale: it lists compilable effects but omits "Inference"
which is now supported; update the rationale string in the same call (the branch
handling effects where eff.name != "Inference" and not handled) to include
"Inference" alongside "pure, IO, Http, State<T>, Exn<E>, and Async" so the
message accurately reflects current compilable effects (refer to the
self._warning call that uses decl, the f"Function '{decl.name}' uses unsupported
effect '{eff.name}' — skipped." message, and error_code="E603").
In `@vera/codegen/functions.py`:
- Around line 167-187: Two qcall handlers (the Http and Inference qualifier
cases) incorrectly set the private flag self._needs_alloc instead of the
canonical allocator flag self.needs_alloc; update the assignments in the Http
and Inference qcall handling code to set self.needs_alloc = True (replacing
self._needs_alloc = True) so they match other paths (arrays, strings, regex,
decimals, map, set, json, html) and align with the WasmContext propagation used
by functions.py and context.py.
In `@vera/wasm/calls.py`:
- Around line 365-374: In the host-call handling branches for call.qualifier ==
"Http" and "Inference" (the code that builds wasm_name like f"http_{call.name}"
and f"inference_{call.name}" and appends instructions with call
$vera.{wasm_name}), replace assignments to self._needs_alloc = True with
self.needs_alloc = True so the canonical allocator flag is set; make this change
in both the Http and Inference branches and keep the rest of the logic (adding
to self._http_ops_used / self._inference_ops_used and appending the call
instruction) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 875eac77-af6c-4b77-ab32-0b85092058c8
⛔ Files ignored due to path filters (6)
docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**examples/inference.verais excluded by!**/*.veratests/conformance/ch09_inference.verais excluded by!**/*.vera
📒 Files selected for processing (30)
AGENTS.mdCHANGELOG.mdCLAUDE.mdREADME.mdROADMAP.mdSKILL.mdTESTING.mdpyproject.tomlscripts/check_readme_examples.pyscripts/check_skill_examples.pyscripts/check_spec_examples.pyspec/00-introduction.mdspec/09-standard-library.mdtests/conformance/manifest.jsontests/test_checker.pytests/test_codegen.pytests/test_html.pytests/test_verifier.pyvera/README.mdvera/__init__.pyvera/browser/runtime.mjsvera/codegen/api.pyvera/codegen/assembly.pyvera/codegen/compilability.pyvera/codegen/core.pyvera/codegen/functions.pyvera/environment.pyvera/wasm/calls.pyvera/wasm/context.pyvera/wasm/inference.py
💤 Files with no reviewable changes (1)
- scripts/check_readme_examples.py
…imeout, docs Bugs fixed: - vera/wasm/calls.py: self._needs_alloc -> self.needs_alloc for Http and Inference qualified calls (was writing to an orphaned attribute, not the canonical WasmContext field read by codegen/functions.py propagation) - vera/codegen/api.py host_inference_complete: read provider/API keys from the execute() env_vars dict when provided, falling back to os.environ Improvements: - vera/codegen/api.py: add _INFERENCE_TIMEOUT = 60s; pass to all urlopen calls - tests/test_codegen.py: pass env_vars= directly to execute(), no os.environ mutation - vera/codegen/compilability.py: add Inference to E603 rationale string - CHANGELOG.md: clarify max_tokens hardcoded for Anthropic specifically - README.md: qualify mockable as host-side mockable; rewrite roadmap paragraph - ROADMAP.md: correct v0.0.100 blurb (removed v0.0.101 features that belong there) - spec/09-standard-library.md: fix #61 -> #371 in limitations list - SKILL.md safe_classify: add requires(string_length(@String.0) > 0) - scripts/check_spec_examples.py: remove stale FUTURE allowlist entries for Inference Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Inference.complete(String → Result<String, String>)as a built-in host-backed algebraic effect, following the Http pattern exactlyclaude-haiku-4-5-20251001), OpenAI (gpt-4o-mini), Moonshot (moonshot-v1-8k) — auto-detected from whichever API key env var is set; override withVERA_INFERENCE_PROVIDER/VERA_INFERENCE_MODELErrexplaining why API keys cannot be safely embedded in client-side JavaScript (recommends Http proxy pattern)urllib.request+jsonfrom Python stdlibNew files
examples/inference.vera— sentiment classification exampletests/conformance/ch09_inference.vera— conformance test (level:check)Compiler changes
vera/environment.py— registersInferenceeffect withcompleteopvera/codegen/— compilability,_inference_ops_usedtracking, WASM import emissionvera/wasm/— qualified call translation, type inferencevera/codegen/api.py—host_inference_complete+_call_inference_provider()helpervera/browser/runtime.mjs— browser Inference stubLimitations filed
max_tokenshardcoded to 1024 for Anthropic; no temperature overrideembedoperation (vector embeddings) deferred; blocked on #373handle[Inference]blocksTest plan
pytest tests/ -v)mypy vera/)python scripts/check_conformance.py)python scripts/check_examples.py)VERA_ANTHROPIC_API_KEY=sk-ant-... vera run examples/inference.veraCloses #61
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Inferenceeffect withcompleteoperation for LLM calls.Result<String, String>returns for error handling.Documentation
Tests
Chores