Add repository-wide AGENT guidelines#1
Conversation
|
🤖 Agent Working I've picked up this issue and am working on it.
You'll receive updates as I make progress. |
|
📋 Design Document Ready I've created a design document for this issue. Summary
📄 Design Document: design_docs/planned/SEASONAL_COTTAGE_SPRITES.md (click to expand)Seasonal Cottage SpritesSummaryAdd seasonal sprite variations for the cottage (COTTAGE tile type) to display different appearances in summer and autumn seasons. Currently, the cottage only has spring and winter variants. This enhancement will provide visual variety and seasonal atmosphere to the game world. Problem Statement
DesignData StructuresAssets Configuration ( export const tileAssets = {
// ... existing assets ...
cottage_wooden: '/TwilightGame/assets-optimized/tiles/cottage_small_spring.png',
cottage_small_summer: '/TwilightGame/assets-optimized/tiles/cottage_small_summer.png',
cottage_small_autumn: '/TwilightGame/assets-optimized/tiles/cottage_small_autumn.png',
cottage_small_winter: '/TwilightGame/assets-optimized/tiles/cottage_small_winter.png',
};Tile Legend ( [TileType.COTTAGE]: {
name: 'Cottage',
color: 'bg-palette-sage',
collisionType: CollisionType.SOLID,
image: [],
seasonalImages: {
spring: [tileAssets.cottage_wooden], // Spring version
summer: [tileAssets.cottage_small_summer], // New summer variant
autumn: [tileAssets.cottage_small_autumn], // New autumn variant
winter: [tileAssets.cottage_small_winter], // Winter version (unchanged)
default: [tileAssets.cottage_wooden],
},
},Sprite Metadata (
Rendering PipelineThe seasonal sprite rendering follows this flow:
File StructureImplementation StepsStep 1: Place Asset Files
Step 2: Update assets.ts
File:
Step 3: Update Tile Legend
Step 4: Optimize Assets
Step 5: Verify Game Engine
Step 6: Testing
TestingManual Testing Steps
Automated Checks
Regression TestingVerify no existing functionality is broken:
Breaking ChangesNone - this is purely additive:
File Changes Summary
Verification ChecklistAfter implementation:
Notes
Related Documentation
Next Steps
Once approved, I'll automatically create a sprint plan for implementation. |
## Summary Move design document for defensive type checking in builtin implementations from planned to implemented, documenting the completed fix for: 1. String comparison type mismatch panic (Issue #1) - Fixed by adding SafeAsString() helper and updating string comparison builtins - Now returns descriptive errors instead of panicking 2. Option pattern matching failures (Issue #2) - Fixed by ensuring TaggedValue construction for Option types matches pattern matcher expectations - Now works correctly with Some(x) and None patterns ## Changes - Move design_docs/planned/v0_7_0/m-builtin-safety-type-checks.md to design_docs/implemented/v0_7_0/m-builtin-safety-type-checks.md - Update status from 'Planned' to 'Implemented' - Add comprehensive implementation report with: - Code locations and metrics - Before/after comparison - Test coverage summary ## Test Results ✅ All builtin tests pass (PASS ok github.com/sunholo/ailang/internal/builtins) ✅ String comparison works correctly ✅ Option pattern matching works correctly ✅ No regressions in related functionality ## Verification Tested with: - String comparison: `substring(s, 0, length(prefix)) == prefix` ✓ - Option pattern matching: `match Some(x) { Some(h) => ..., None => ... }` ✓ 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Round-1 sprint evaluation flagged three items (1 medium, 2 low). All three addressed in this follow-up commit; no new design-doc deviations. #1 (medium): Snapshot test for streaming-vs-non-streaming AI span shape - New cmd/ailang/configdriven_streaming_span_snapshot_test.go (197 LOC) - TestStreamingAISpan_SameShapeAsNonStreaming asserts that ctx.RecordAIEffect produces the same TraceEvent shape for "call" and "streamCall" — modulo OpName ("call" vs "streamCall") and Args content (1 vs 3 strings). Locks in the design-doc A4/A9 contract: streaming AI cannot silently degrade observability vs non-streaming AI. - TestStreamingAISpan_RecordedFromAIStreamCallEndToEnd verifies the real aiStreamCall function reaches the recording call when invoked end-to-end against a mock SSE server. Belt-and-suspenders confirmation. #2 (low): CapabilityNotSupported error code wiring - Provider-registry misses (cmd/ailang/configdriven_streaming.go) now return ProtocolError("[ProviderNotFound] ...") rather than constructing a fake "ProviderNotFound" StreamErrorKind variant that wasn't in the declared ADT. Streaming-disabled / capabilities-streaming-false misses in BuildStreamRequest now carry "[CapabilityNotSupported]" prefix. - Pattern: real StreamErrorKind variant + structured "[code]" prefix in the message string. Callers can pattern-match on ProtocolError AND switch on the [code] tag if needed. Documented inline. - Tests updated to assert on (ProtocolError, [code] prefix) instead of fake-variant constructor names. #3 (low): Recipe page pseudocode → concrete v1 snippet - docs/docs/recipes/ai-token-streaming.md replaces the "pseudocode (v1.1 will expose this via parseDelta)" block with a working v1 extractDelta template using std/json.decode and std/json.getString. Honest about the v1 limitation that std/json doesn't yet ship a path-walker — code shows the structural pattern callers should follow until v1.1's parseDelta. All 6 packages still green: internal/pkg, internal/ai, internal/ai/configdriven, internal/effects, internal/builtins, cmd/ailang. Full make test passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…NGELOG
In-repo Pillar 2 work:
- docker/Dockerfile.agent-motoko: clones sunholo-data/motoko_agent at
pinned commit 84fa449, installs bun + motoko-ext-* packages,
symlinks scripts/run-agent.sh to /usr/local/bin/motoko. Mirrors
Dockerfile.agent-pi (CLI-only, no Go toolchain).
- internal/dispatch/cloudrun/dispatcher.go: knownVariants["motoko"]=true.
- docker/agent-motoko-multivac-prs.md: step-by-step checklist for the
two ailang-multivac PRs (cloudbuild + cloudbuild-images sync per
EXECUTOR_SHAPE §6 drift warning; agent_executor_motoko Cloud Run
Job with cost-controlled secret bindings — OPENROUTER + OPENAI +
GEMINI only, NO ANTHROPIC per pi precedent).
Cross-repo work (NOT in this commit, requires ailang-multivac access):
- PR #1 to ailang-multivac: cloudbuild.yaml + cloudbuild-images.yaml
add build-agent-motoko + push-agent-motoko steps (in BOTH files).
- PR #2 to ailang-multivac: terraform/cloud_run_jobs.tf adds
agent_executor_motoko block with VPC connector + cost-controlled
env bindings. Smoke test: terraform apply to ailang-multivac-dev,
coordinator dispatch with --executor motoko.
M5 (threshold measurement) is queued — requires either the cloud Job
above or a local run with OPENROUTER_API_KEY budget. The eval-suite
command is documented in the CHANGELOG entry; numbers will be appended
under a follow-up entry once data exists.
Tests: full go test ./... green; whole-tree builds clean.
Closes M4 of M-MOTOKO-EXECUTOR-ADAPTER (in-repo portion). M5 deferred
to follow-up after cloud Job lands or local run executes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… 10 integration gaps Today's live smoke testing of v0.18.0's M-MOTOKO-EXECUTOR-ADAPTER surfaced 10 interconnected gaps that prevent trustworthy benchmark numbers. Three got partial fixes during the day (HealthCheck no-spawn, MOTOKO_REPO fallback, MOTOKO_HEADLESS, run_summary-before-done reorder) but root causes remain across both repos. User feedback: "we need it all I think. lets get to the bottom of the gaps - I think a design doc process will help." This sprint sequences the fixes properly: Phase 1: Investigation-first for gap #1 (run_summary not reaching disk on success path) — debug:checkpoint markers + bisect. Non-negotiable; writing a fix without the cause is gambling. Phase 2: motoko-side fixes (gap #1 root-cause fix + #6 extension visibility + #7 --headless flag + #8 --version mode + #10 TS process.exit removal so emission ordering doesn't matter) Phase 3: AILANG-side fixes (gap #2 success-criteria fallback to thinking.finish_reason + #5 MOTOKO_REPO discovery from wrapper) Phase 4: Cross-cutting (gap #4 session_id unification — adapter canonical, TS wrapper honors, AILANG runtime emits matching) Phase 5: Config layer (gap #3 + #9 cost_rates source-of-truth in models.yml.pricing → env-var override of motoko's profile config) Phase 6: End-to-end validation — TestEndToEnd_FullResultPopulation asserts every Result field; M5 paired-comparison motoko-claude-haiku-4-5 vs claude-haiku-4-5 produces real numbers. Architectural posture: eliminate fragile assumptions at every layer. Today's adapter assumes things that aren't true (wrapper preserves session_id, cost_rates configured, run_summary always reaches disk, loaded_extensions field accurate). After this hardening, none of those assumptions remain — each replaced with explicit observable contracts. Net axiom score: +13 (no hard violations). Strong A2 (replayability — captured runs are fully reproducible), A7 (machines first — Result fields mechanically reliable), A9 (cost visibility — eliminates $0 reporting gap). Estimated 3 working days, ~530 LOC including tests, across both repos. GATING for M5 of v0.18.0 (threshold-measurement) and v0.19.0 M-MOTOKO-EXT-PER-TASK (which needs accurate session_ids + extension visibility from this hardening). Cross-references: - v0.18.0 M-MOTOKO-EXECUTOR-ADAPTER Future Work updated to point at this hardening as the trustworthy-numbers prerequisite - v0.19.0 M-MOTOKO-EXT-PER-TASK Dependencies updated to mark v0.18.1 as BLOCKING (was just "after local validation") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…design docs Phase 6 of v0.18.1 hardening sprint. Moves both design docs from design_docs/planned/v0_18_1/ to design_docs/implemented/v0_18_1/ and updates their status headers to "Implemented (2026-05-08)" with cross-repo commit references. Adds the v0.18.1 entry to changelogs/v0.10-current.md covering all five phases: - Phase 1 (gap #1): JSONL drain race in TS layer - Phase 2 (gaps #6, #7, #8): extensions visibility, --headless, --version - Phase 3 (gaps #2, #5): success fallback, MOTOKO_REPO discovery - Phase 4 (gap #4): session_id unification - Phase 5 (gaps #3, #9): cost rates env-var passthrough Acceptance gate: 5 of 7 conditions met; the remaining 2 (CostUSD>0 end-to-end + smoke success) blocked on a separate Bedrock validation issue (extension tool names with `/` fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern). The pricing env-var plumbing is verified by unit tests; live smoke needs the extension fix downstream. LOC tally: ~80 AILANG-side + ~250 motoko-side + 11 new tests across both repos, in ~6 hours wall-clock vs the 3-day plan estimate. Sprint retrospective: investigation-first paid off — the 12 debug: checkpoint markers in Phase 1 directly identified the silent-exit point as the TS process.exit-on-done race, which would have been maddening to find by code-reading alone. The resulting fix was tiny (~25 LOC across 2 TS files) but unblocked everything downstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The first 3-harness paired comparison on `--agent-parallel 2` (run today
2026-05-08, after v0.18.1 shipped) revealed motoko has a parallel-execution
class of failures the serial-mode v0.18.1 hardening doesn't cover.
CONTEXT
=======
- v0.18.1 closed serial-mode gaps; serial smoke = 42/45 (93% — failures
are benchmark-correctness misses, not infrastructure)
- v1 parallel (no fixes) = 40/45 (88.9%)
- v2 parallel (with EADDRINUSE retry/yield fix) = 37/45 (82.2%) — REGRESSED
- 4 of 5 motoko parallel failures: dur_s=0 + 0-byte JSONL ("motoko
terminated without emitting run_summary") = crash BEFORE TS init
ROOT CAUSE (per cross-executor audit in design doc):
Motoko is the OUTLIER in the executor fleet. claude/gemini/codex/opencode/
pi all use `cmd.Dir = task.Workspace` + no shared filesystem state +
no embedded services. Motoko inherited a different design (long-lived
TUI with embedded env-server + cd-into-shared-MOTOKO_REPO) and the
v0.18.0 adapter wraps it without re-isolating.
SCOPE
=====
3 hypotheses to bisect in Phase 1 (investigation-first per the v0.18.1
gap #1 pattern that paid off):
H1: Cache-write race (.ailang/cache/compile/.../core.gob clobber)
H2: Per-task env-server isolation gap (EADDRINUSE handler routes to
sibling's env-server bound to sibling's workdir)
H3: Shared registry state (MOTOKO_REPO/src/core/ext/registry_generated)
PROPOSED FIX (3 coordinated layers, mirrors M-SERVE-API-CONCURRENCY's
per-request-isolation playbook):
1. Per-task MOTOKO_HOME (hardlink-mirror of MOTOKO_REPO per spawn)
2. Single env-server per session (drop inline OR drop auto_start)
3. Cache pre-warming opt-in via HealthCheck
ACCEPTANCE GATE
===============
5 consecutive runs of 15-benchmark smoke tier × motoko-claude-haiku-4-5
× --agent-parallel 4 see ≥95% success rate over 60 runs (≤3 failures,
all benchmark-correctness misses NOT infrastructure failures).
LOC + Time
==========
~250 LOC across both repos, 2 days estimate. Follows v0.18.1's pattern
(actual was ~330 LOC + 11 tests in ~6h vs 3-day estimate — let's see
if the per-task isolation reuse from M-SERVE-API-CONCURRENCY accelerates).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ows CI fixes Addresses the two low-severity follow-up items from the round-1 sprint-evaluator verdict (PASS @ 91/100) plus Windows CI test flakes the user surfaced. cmd/wasm/effects.go (266 LOC removed) + effects_cognition.go (290 LOC, new): - Extract WasmDOMHandler + WasmMsgHandler + setDOM*/setMsg* + getOrCreate* + domPatchToJS into a dedicated file so each module stays under the 800-line AI-maintainability threshold - cmd/wasm/effects.go drops from 918 → 652 LOC (back under threshold) - effects_cognition.go is build-tagged js && wasm same as the original - Shared helpers (awaitJSResult, jsGetString, jsGetInt, replInstance) continue to live in effects.go / effects_helpers.go — same package, so the split is purely organizational docs/docs/guides/wasm-integration.md (+108 LOC): - New "Cognitive OS Substrate (v0.21.x)" section covering: shipped effects (DOM/Msg/Trace), step-pattern interface, cognitive event log + replay determinism claim, JS API for the bridges, runnable example pointer, end-to-end status table separating shipped vs deferred items across M-COG-RUNTIME / M-COG-RUNTIME-BROWSER / M-COG-MEMORY / M-COG-MESH - The sprint plan named docs/docs/guides/wasm-runtime.md as the target; the actual existing guide is wasm-integration.md, so the section is added there Windows CI test fixes (two flakes the user surfaced): cmd/ailang/main_run_pipe_test.go (+8 LOC): - TestRunCommand_PipedStdoutFlushesPerLine was failing on windows-latest with "EVENT_1 arrived at 1.6967s — too late". The load-bearing gap assertion (EVENT_1 → EVENT_2 ≥ 200ms) passed; only the belt-and-suspenders absolute-time check failed because the ailang binary cold-start cost on Windows runner VMs is ~1.7s vs <0.5s on Linux/macOS - Fix: scale the upper bound to 3.5s on Windows via runtime.GOOS - The gap check remains the load-bearing assertion at 200ms internal/lsp/diagnostics_test.go (+19 / -6 LOC): - TestDidSaveRepublishes was failing on windows-latest with "no diagnostics arrived after didSave" (5s timeout). LSP pipeline latency on Windows runners exceeds the 5s budget that works locally - Fix: new diagWaitTimeout() helper returns 15s on Windows, 5s elsewhere; all four sink.wait(docURI, 5*time.Second) sites updated - Server lifecycle context bumped to 3× the diag wait so the parent context doesn't expire while a wait is still in flight on Windows Both tests pass locally (Linux/macOS) post-change. The Windows budgets preserve test intent (verify streaming / verify republish) without turning either test into a no-op. Refs: .ailang/state/evaluations/eval_M-COG-RUNTIME_round_1.json (feedback items #1, #2) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
M-AILANG-ERROR-QUALITY iter 3 (compiler error-msg #1): the type-checker was leaking Go internal type names like `*types.TList` to users (and to LLM eval agents) in unification error messages. The agent sees these and has no idea what they mean — `*types.TList` was never in any AILANG doc. Replaces 5 occurrences of `%T` (Go-internal type sigil) with `.String()` (the canonical AILANG-level type printer that produces e.g. `[string]` or `(int) -> bool`): - cannot unify function type with X - cannot unify list type with X (2x: TCon fallback + general) - cannot unify array type with X - cannot unify map type with X - cannot unify tuple type with X - cannot unify type application with X Now also includes BOTH sides of the unification (t1 and t2) so the error shows the full mismatch, not just the right-hand side. Example improvement (the exact balanced_parens failure from Iter 1/2): Before: type unification failed at [list pattern]: cannot unify function type with *types.TList After: type unification failed at [list pattern]: cannot unify function type with [string] The "function type" + "[string]" tells the agent: "you wrote what AILANG parsed as a function, but the context expected a list of strings". That's actionable; *types.TList was not. Doesn't fix the "add a 'did you mean [head,...tail]' suggestion" gap from the design doc — that needs path-aware logic in inference_helpers.go that detects list-pattern context and adds a hint. Deferring that to iter 4 if iter 3 alone doesn't recover balanced_parens. Build + full make ci pass (117s). Three further %T cases remain in unification_records.go which the eval data hasn't flagged as a problem yet — will revisit if record-pattern errors surface in later rotations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… dispatch Three concrete gaps prevent `ailang messages send eval-rig "task" --requires agent:motoko` from working end-to-end after M-COORD-MULTI-HOST-WORKERS v0.22.0 shipped the routing primitives: 1. Local daemon HTTP listener off by default (PORT env not in launchd plist) 2. `ailang messages send` CLI missing `--requires` flag 3. No cloud motoko fallback (Dockerfile exists, but no cloudbuild step and no Cloud Run Job) Targets v0.23.0, estimated 1-2 days. Direct follow-on to M-COORD-MULTI-HOST-WORKERS — item #1 in its Future Work section ("Cloud-fallback routing"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three small additions enable the daemon's HTTP listener on local-mode installs: 1. plist template gains PORT env var with new @HTTP_PORT@ token. Comment explains that without PORT, /api/messages and /health are unreachable and tag-routed sends fail silently. 2. install_coordinator.sh accepts --port N (default 8765, validated as unprivileged 1024-65535), AILANG_COORD_HTTP_PORT env override, and a final-line `curl http://127.0.0.1:$HTTP_PORT/health` reminder. 3. coordinator_lifecycle.go::printCoordinatorStatusOutput probes the listener and prints "HTTP: ✓ http://127.0.0.1:8765" or a clear "no PORT configured" hint pointing at make coord-install. discoverCoordinatorHTTPPort reads AILANG_COORD_HTTP_PORT → PORT env → plist (single regexp; pulling in a plist parser for one key would be overkill). probeCoordinatorHTTP uses a 500ms timeout so the status command stays fast on misconfigured hosts. Verified live on this Studio: reinstalled the plist with --port 8765, daemon bound the listener, /health returned 200, status command printed the new line. The pre-existing v0.24.0 comment headers on the plist + installer were cleaned up to reflect v0.22.0 (M-COORD-MULTI-HOST-WORKERS) + v0.23.0 (this sprint) — leftover from the v0.22 relabel commit that didn't touch these files. Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (cloud-fallback routing needs M3 to land too, but M1+M2 are the local-side prereqs). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s send` Closes the v0.22.0 CHANGELOG-acknowledged gap. The flag accepts comma-separated worker tags (`--requires agent:motoko,ollama:gemma4`) and, when present, routes the message through the local daemon's HTTP /api/messages endpoint instead of the SQLite-only path. The daemon attaches the tags as Pub/Sub attributes so worker subscriptions can do tag-subset filtering per M-COORD-MULTI-HOST-WORKERS v0.22.0. The HTTP path reuses M1's `discoverCoordinatorHTTPPort` + `probeCoordinatorHTTP` helpers (env → plist), so `--requires` automatically works on any host whose launchd plist was installed by the M1-updated install_coordinator.sh. If the daemon HTTP listener isn't reachable, the error is actionable (suggests `make coord-install` + the launchctl bootstrap command), not silent. Without `--requires`, behavior is unchanged from v0.22.0 — the SQLite path stays the default for fire-and-forget local queueing. The previous v0.22.0 comment block at messages_send.go:40 explaining "intentionally NOT extended with --requires" was replaced with the new behavior doc. Coverage: - TestSplitAndTrim: 8 cases for the comma-separated parser (single/multi/whitespace/empty/trailing-comma/all-empty) - TestSendViaHTTP_PostsCorrectShape: verifies POST body matches the postMessageRequest fields in daemon_http.go (inbox/title/content/from/ category/requires) - TestSendViaHTTP_HonorsAPIKey: COORDINATOR_API_KEY env → Bearer header - TestSendViaHTTP_ErrorWhenUnreachable: clear "no PORT" error path with next-step hint All tests pass deterministic on -count=20. Live verified on this Studio: `ailang messages send eval-rig 'M2 smoke' --requires agent:motoko --from sprint-executor` → message landed in SQLite via the HTTP endpoint, daemon logs show the POST. Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (local-CLI side closed; cloud-fallback Job is M3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oss-repo PR checklist v0.23.0 refresh
In-repo changes (the only M3 work that ships in this commit — the rest
lives in ailang-multivac):
1. cloudbuild-dev.yaml gains `build-agent-motoko` step mirroring
`build-agent-go` (registry-cached buildx, FROMs agent-base via
Dockerfile.agent-motoko's existing FROM). Push happens via
`--push` flag like the other agent-* builders. `deploy-services`
waitFor now includes build-agent-motoko so the deploy step doesn't
race ahead of the image being available.
2. docker/agent-motoko-multivac-prs.md refreshed for v0.23.0 scope:
- NEW: PR #0 (operational) — cloud `ailang-coordinator` is on a
2026-04-28 image (pre-v0.21.0); MUST redeploy before E2E can
exercise the v0.22.0 `requires` field
- PR #2 addendum — coordinator agent config (config.yaml in the
mounted ConfigMap) needs `motoko` agent entry with `worker_tags:
[agent:motoko]` so M-COORD-MULTI-HOST-WORKERS tag matcher
recognises the cloud Job as a valid dispatch target
- PR #2 Job spec gets `max_retries = 1` (motoko is non-idempotent
in cost — one retry max)
- PR #3 (NEW, deferred) — `ailang-openrouter-api-key` prod secret
resource. Currently only ailang-multivac-DEV has the secret;
prod motoko cloud-dispatch is gated on cost analysis from dev
throughput. Per-Job $0.30 cap on `motoko-or-gemma-4-26b` bounds
the blast radius.
- End-to-end smoke command updated to use the new --requires CLI
flag from M2 (closes the v0.22.0 CLI gap that necessitated
curl POST workarounds)
Acceptance gate refresh: 5 items, including the PR #0 pre-flight
("coordinator image timestamp shows post-v0.22.0 deploy").
What's NOT in this commit (intentional — cross-repo):
- The ailang-multivac terraform/cloud_run_jobs.tf addition (PR #2 body)
- The mounted coordinator config update (PR #2 addendum body)
- The prod secret resource (PR #3, deferred)
- The ailang-multivac cloudbuild.yaml + cloudbuild-images.yaml updates (PR #1)
Lints clean. cloudbuild-dev.yaml YAML validates (10 steps, build-agent-motoko
inserted between build-agent-go and push-coordinator).
Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (cloud-fallback
routing) — the local-side closures landed in M1/M2; this completes the
in-repo half of the cross-repo cloud-side work.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…trix
Three docs updates:
1. changelogs/v0.10-current.md: comprehensive sprint entry covering
M1 (launchd PORT + status probe), M2 (--requires CLI flag),
M3 (in-repo half of cloud-fallback: cloudbuild step + cross-repo
PR checklist). Explicit verification matrix shows what works locally
versus what's gated on the cross-repo / cross-deploy PRs:
- Scenario 1 (Studio→Studio): partial — HTTP send path verified
(M2 live smoke), but local dispatcher's requires-aware executor
selection is a follow-up
- Scenario 2 (laptop→cloud→Studio): deferred — gated on PR #0
(cloud coordinator redeploy from April-28 image)
- Scenario 3 (cloud-fallback Job): deferred — gated on PRs #1+#2
in ailang-multivac
2. docs/docs/guides/coordinator-workers.md: refreshed Example 2 with
the new `--requires` CLI invocation (replaces hand-rolled curl);
added "HTTP endpoint configuration" subsection (default port 8765,
override via env or --port flag, /health probe, route catalog with
per-route auth requirements + warning about exposing :8765 without
COORDINATOR_API_KEY).
3. docs/docs/guides/agent-messaging.md: new "Tag-routed sends (v0.23.0+)"
subsection with concrete --requires examples (single tag, multi-tag
intersection) + prerequisites callout (HTTP listener up, worker
advertising the tag set).
Honest accounting: the local-side surface (M1+M2) is feature-complete
and ready for use today. The cloud-side dispatch path (M3.x in
ailang-multivac repo) is documented but not in production. The
sprint plan called this out as expected — the in-repo half is what
ships in v0.23.0; the cross-repo PRs are tracked separately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ed/v0_23_0/ Sprint complete: 4/4 milestones pass. In-repo shipped: - M1: launchd PORT env + status probe (commit 49664aa, 86 LOC) - M2: --requires CLI flag + 4 tests (commit 9544139, 274 LOC) - M3: cloudbuild build-agent-motoko + cross-repo PR checklist (commit e4df2f4, 135 LOC) - M4: docs + CHANGELOG + verification matrix (commit 012cf39, 101 LOC) Total: 596 LOC actual vs 305 estimated (overshot — docs heavier than the design doc accounted for, and the cross-repo PR checklist refresh in M3 was richer than a thin update). Verification matrix (honest): - Scenario 1 (Studio→Studio): partial — HTTP send verified live; local dispatcher's requires-aware executor selection is a follow-up - Scenario 2 + 3: deferred on the cross-repo PRs documented in docker/agent-motoko-multivac-prs.md (PR #0/#1/#2 in ailang-multivac repo, plus the operational cloud coordinator redeploy) The local-side surface (M1+M2) is feature-complete and ships in v0.23.0. Next: hand off to sprint-evaluator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ation misdiagnosis
KEY FINDING: investigating the "token truncation" theory revealed it was wrong.
The failing run_length_encode/type_unify/red_black_tree outputs were only 161-442
tokens (8192 limit) — NOT harness-truncated. The real cause: `++` used for string
concatenation (type error since v0.13.0), after which the parser bails producing an
EOF-looking error downstream.
`++` for strings appears in 46% of ALL compile failures (1374/2948) — by far the
single largest AILANG compile-failure cause across every model tier. And it is
ALREADY in the teaching prompt (3 places) — so this is a SALIENCE problem, not a
coverage gap. The trained `++` reflex (Haskell/Elm/PureScript) overrides a buried
table row.
- NEW m-prompt-string-concat-plusplus (P0): salience redesign — top-of-prompt
hard-rules box + targeted type-error fix-it suggesting "${...}". Projected
+8-12pt CPR, dwarfing all other prompt fixes combined.
- m-prompt-concise-recursive-solutions: CORRECTED — demoted P2→P3, root cause
note added pointing at the ++ doc. The truncation theory was a misdiagnosis.
- m-prompt-single-file-module: completed (multi_module_imports, 4/4 compile fail).
The eval harness is NOT over-restricting output length (the user's question) —
8192 tokens is plenty; failures stop at <450 tokens due to genuine syntax errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CompareOutput did exact string match after trimming outer whitespace, so correct
JSON output failed on formatting: AILANG's std/json encode emits compact `{"a":1}`
while benchmarks (and Python's json.dumps default) expect spaced `{"a": 1}`. The
v0.24.1 analysis found 9/10 whitespace-only AILANG "logic_error" failures were
correct JSON failing byte-exact match — all of ast_patch_roundtrip (the #1
AILANG-vs-Python gap, which looked "genuinely hard" at 38% but was a grader artifact).
Fix: if BOTH expected and actual parse as valid JSON, compare canonical parsed forms
(reflect.DeepEqual on json.Unmarshal). This also handles int-vs-float (all JSON numbers
→ float64) and key order. SAFE: only triggers when both sides are valid JSON, so
non-JSON near-misses ("1 2" vs "12") and formatted-text benchmarks are unaffected —
exact match remains the fast path and genuinely-wrong JSON still fails.
Verified against real v0.24.1 data: 9 false failures resolved (ast_patch_roundtrip
38%→~95%). 13 CompareOutput unit tests incl. 2 safety cases.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Frequency analysis of 334 local-qwen agent trials (44 failures) shows ~36% are
a single family — expression-body (= expr) vs block-body ({ stmts }) /
statement-separator confusion — dominated (20.5%) by the
`func f() = let x = e; rest` reflex (PAR017: ';' not valid in expression-body
functions). match...with (PAR019) and ++-for-string-concat — the old big-model
top failures — are now rare/zero on qwen, so the card already works for those;
the small-model frequency banners undercount what's still live.
- Sharpen dialect-traps card trap #2 to name the exact `= let x = e; rest`
anti-pattern + both fixes (brace block, or let-in). Verified: anti-pattern
rejects (PAR017), both fixes run.
- Record the local-qwen frequency data in m-ailang-error-quality-for-llm-iteration
(re-prioritizes it): parser/card already cover PAR017 yet the model fails it and
can't recover (config_file_parser thrashed 66 turns) — the lever is making PAR017
recovery-actionable.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tatements The #1 unactionable small-model failure in agent mode is the mirror of PAR017: a *missing* ';' between block statements. A model writes a { } block body and drops the separator (`pure func f() -> int { let n = length(s) if n > 0 ... }`), and the parser emitted a bare "PAR_UNEXPECTED_TOKEN: expected }, got if" with zero recovery signal — config_file_parser burned 66 agent turns on exactly this. The parser now emits PAR020 — "missing ';' between block statements (found `X` where `;` or `}` was expected)" with the concrete two-line fix and a docs link — when a block body (function-declaration path, parser_func.go) or block expression (parser_expr.go) is followed by a statement-starting token (let/letrec/if/match/ identifier) instead of ';'/'}'. Shared via missingBlockSemicolonError() + peekStartsBlockStatement(). PAR017 (extra ';') + PAR020 (missing ';') now bookend the whole ';'-confusion family — ~32% of local-qwen agent failures. Found via the M-AILANG-ERROR-QUALITY frequency analysis of 334 qwen trials. - TestPAR020_MissingBlockSemicolon: fires on the pattern; no false-positive on valid or single-expression blocks. - parser/elaborate/pipeline suites green; make verify-examples at baseline (181/5/2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68d645b6cd7c832dac5f52310aa94a5e