Skip to content

Add repository-wide AGENT guidelines#1

Merged
MarkEdmondson1234 merged 1 commit into
mainfrom
codex/analyze-codebase-and-create-agent.md
Sep 26, 2025
Merged

Add repository-wide AGENT guidelines#1
MarkEdmondson1234 merged 1 commit into
mainfrom
codex/analyze-codebase-and-create-agent.md

Conversation

@MarkEdmondson1234

Copy link
Copy Markdown
Member

Summary

  • document key language concepts, project structure, and contribution expectations in a repository-wide AGENTS.md file
  • highlight design docs that implementers should consult, including row unification and typeclass dictionary references

Testing

  • not run (documentation only)

https://chatgpt.com/codex/tasks/task_e_68d645b6cd7c832dac5f52310aa94a5e

@MarkEdmondson1234 MarkEdmondson1234 merged commit 17d801e into main Sep 26, 2025
@MarkEdmondson1234 MarkEdmondson1234 deleted the codex/analyze-codebase-and-create-agent.md branch September 26, 2025 07:52
@sunholo-voight-kampff sunholo-voight-kampff added the coordinator:in-progress Task claimed by a coordinator instance - prevents duplicate work label Jan 22, 2026
@sunholo-voight-kampff

Copy link
Copy Markdown
Collaborator

🤖 Agent Working

I've picked up this issue and am working on it.

Field Value
Task ID task-394d0cf7
Agent AILANG Coordinator
Stage Design Document
Status In Progress

You'll receive updates as I make progress.

@sunholo-voight-kampff

Copy link
Copy Markdown
Collaborator

📋 Design Document Ready

I've created a design document for this issue.

Summary

Field Value
Task ID task-394d0cf7
Duration 3m9.02225125s
Cost $0.3492
Tokens 19135 (9124 in / 10011 out)

📄 Design Document: design_docs/planned/SEASONAL_COTTAGE_SPRITES.md (click to expand)

Seasonal Cottage Sprites

Summary

Add seasonal sprite variations for the cottage (COTTAGE tile type) to display different appearances in summer and autumn seasons. Currently, the cottage only has spring and winter variants. This enhancement will provide visual variety and seasonal atmosphere to the game world.

Problem Statement

  • The COTTAGE tile type (TileType.COTTAGE) currently uses only two sprites:
    • Spring/Summer/Autumn: cottage_small_spring.png
    • Winter: cottage_small_winter.png
  • All warm seasons show the same spring sprite, reducing visual variety
  • The game has full seasonal system support but underutilizes it for the cottage

Design

Data Structures

Assets Configuration (assets.ts):

export const tileAssets = {
  // ... existing assets ...
  cottage_wooden: '/TwilightGame/assets-optimized/tiles/cottage_small_spring.png',
  cottage_small_summer: '/TwilightGame/assets-optimized/tiles/cottage_small_summer.png',
  cottage_small_autumn: '/TwilightGame/assets-optimized/tiles/cottage_small_autumn.png',
  cottage_small_winter: '/TwilightGame/assets-optimized/tiles/cottage_small_winter.png',
};

Tile Legend (data/tiles.ts):

[TileType.COTTAGE]: {
  name: 'Cottage',
  color: 'bg-palette-sage',
  collisionType: CollisionType.SOLID,
  image: [],
  seasonalImages: {
    spring: [tileAssets.cottage_wooden],           // Spring version
    summer: [tileAssets.cottage_small_summer],     // New summer variant
    autumn: [tileAssets.cottage_small_autumn],     // New autumn variant
    winter: [tileAssets.cottage_small_winter],     // Winter version (unchanged)
    default: [tileAssets.cottage_wooden],
  },
},

Sprite Metadata (data/spriteMetadata.ts):

  • No changes required - uses tileAssets.cottage_wooden as default image
  • Seasonal rendering is handled by existing game engine at render time
  • The sprite metadata provides fallback/default image only

Rendering Pipeline

The seasonal sprite rendering follows this flow:

  1. Game State tracks current season (Spring, Summer, Autumn, Winter)
  2. TileRenderer (PixiJS layer) checks seasonalImages in TILE_LEGEND
  3. Sprite Selection:
    • If current season exists in seasonalImages: use that sprite
    • Otherwise: use default from seasonalImages
    • Fallback: use image array from tile definition
  4. TextureManager caches and loads selected sprite
  5. SpriteLayer renders the selected seasonal texture

File Structure

public/assets/tiles/
├── cottage_small_spring.png      (existing - spring/summer/autumn)
├── cottage_small_summer.png      (NEW - summer variant)
├── cottage_small_autumn.png      (NEW - autumn variant)
└── cottage_small_winter.png      (existing - winter)

public/assets-optimized/tiles/    (automatically generated by optimize-assets script)
├── cottage_small_spring.png
├── cottage_small_summer.png
├── cottage_small_autumn.png
└── cottage_small_winter.png

Implementation Steps

Step 1: Place Asset Files

  • Add cottage_small_summer.png to /public/assets/tiles/
  • Add cottage_small_autumn.png to /public/assets/tiles/
  • These are high-quality source files (will be optimized automatically)

Step 2: Update assets.ts

  • Add cottage_small_summer asset reference
  • Add cottage_small_autumn asset reference
  • Follow naming convention: cottage_small_[season].png

File: assets.ts (lines 33-34)

  • Insert new asset definitions after existing cottage assets
  • Use optimized asset paths: /TwilightGame/assets-optimized/tiles/cottage_small_*.png

Step 3: Update Tile Legend

  • Modify TileType.COTTAGE in data/tiles.ts (lines 772-776)
  • Change seasonalImages to use individual assets per season:
    • spring: cottage_wooden (current behavior)
    • summer: cottage_small_summer (NEW)
    • autumn: cottage_small_autumn (NEW)
    • winter: cottage_small_winter (current behavior)

Step 4: Optimize Assets

  • Run npm run optimize-assets to generate optimized versions
  • Creates optimized PNG files in /public/assets-optimized/tiles/
  • Automatically detects "cottage" keyword and applies 1024px size, 97% quality

Step 5: Verify Game Engine

  • No changes needed to sprite metadata (uses default image fallback)
  • Rendering engine automatically uses seasonal images when present
  • Existing tests and validation will pass without modification

Step 6: Testing

  • Start dev server: npm run dev
  • Visit game at http://localhost:4000/TwilightGame/
  • Change seasons through TimeManager or debug tools
  • Verify cottage sprite changes match each season:
    • Spring: cottage_small_spring.png
    • Summer: cottage_small_summer.png
    • Autumn: cottage_small_autumn.png
    • Winter: cottage_small_winter.png

Testing

Manual Testing Steps

  1. Start Development Server

    npm run dev
  2. Launch Game

    • Open http://localhost:4000/TwilightGame/ in browser
    • Wait for game to fully load
  3. Navigate to Cottage

    • Move to any map location with a COTTAGE tile (e.g., village map)
    • Observe current cottage appearance
  4. Test Season Progression

    • Use debug tools or natural time progression to cycle through seasons
    • For each season, verify:
      • Correct sprite is displayed
      • Sprite loads without errors (check console)
      • Sprite is properly scaled and positioned
      • No visual glitches or alignment issues
  5. Console Validation

    • Open Chrome DevTools (F12)
    • Check for any texture loading errors
    • Verify no 404s for missing asset files
    • Confirm seasonal images are cached by TextureManager
  6. Visual Inspection

    • Spring: Cottage with green ivy/plants (existing sprite)
    • Summer: Cottage with full lush vegetation (new sprite)
    • Autumn: Cottage with autumn colors/foliage (new sprite)
    • Winter: Cottage with snow/seasonal changes (existing sprite)

Automated Checks

  • TypeScript compilation: npx tsc --noEmit (should pass with zero errors)
  • No new test framework required (seasonal rendering is existing feature)
  • Sanity checks in testUtils.ts automatically validate tile configurations

Regression Testing

Verify no existing functionality is broken:

  • Other seasonal tiles still change (trees, bushes, etc.)
  • Cottage collision/pathfinding still works
  • Cottage sprite metadata still renders correctly
  • Game performance unchanged (same number of sprites)

Breaking Changes

None - this is purely additive:

  • Existing seasonal image rendering pipeline unchanged
  • Sprite metadata remains backward compatible
  • Default fallback behavior preserved

File Changes Summary

File Changes Lines
assets.ts Add 2 new asset definitions +2
data/tiles.ts Update COTTAGE seasonalImages 3 (modify)
public/assets/tiles/ Add 2 new PNG files +2 files
public/assets-optimized/tiles/ Auto-generated by script +2 files

Verification Checklist

After implementation:

  • Both new PNG files exist in /public/assets/tiles/
  • assets.ts has new asset definitions
  • data/tiles.ts COTTAGE definition uses all 4 seasonal images
  • npm run optimize-assets runs successfully
  • Optimized files generated in /public/assets-optimized/tiles/
  • Game loads without console errors
  • Cottage sprite changes with seasons
  • TypeScript compiles with zero errors: npx tsc --noEmit
  • All existing functionality still works

Notes

  • Asset optimization is automatic via npm run optimize-assets script
  • Cottage keyword triggers 1024px size (high-quality showcase tier)
  • Seasonal sprite selection happens at render time (no storage overhead)
  • Follows existing pattern used by: Shop, Garden Shed, Trees, etc.
  • Linear (smooth) scaling is already configured in TextureManager

Related Documentation

  • ASSETS.md - Asset management and guidelines
  • TIME_SYSTEM.md - Seasonal system documentation
  • docs/MAP_GUIDE.md - Tile and sprite placement guide
  • design_docs/planned/PIXI_MIGRATION.md - Rendering engine details

Next Steps

  1. Review the design document above
  2. Add the design-approved label to this issue to proceed to sprint planning
  3. Add the needs-revision label if changes are needed

Once approved, I'll automatically create a sprint plan for implementation.

@sunholo-voight-kampff sunholo-voight-kampff added the needs-design-approval Awaiting human approval of design document label Jan 22, 2026
MarkEdmondson1234 pushed a commit that referenced this pull request Jan 27, 2026
## Summary
Move design document for defensive type checking in builtin implementations from planned to implemented, documenting the completed fix for:

1. String comparison type mismatch panic (Issue #1)
   - Fixed by adding SafeAsString() helper and updating string comparison builtins
   - Now returns descriptive errors instead of panicking

2. Option pattern matching failures (Issue #2)
   - Fixed by ensuring TaggedValue construction for Option types matches pattern matcher expectations
   - Now works correctly with Some(x) and None patterns

## Changes
- Move design_docs/planned/v0_7_0/m-builtin-safety-type-checks.md to
  design_docs/implemented/v0_7_0/m-builtin-safety-type-checks.md
- Update status from 'Planned' to 'Implemented'
- Add comprehensive implementation report with:
  - Code locations and metrics
  - Before/after comparison
  - Test coverage summary

## Test Results
✅ All builtin tests pass (PASS ok github.com/sunholo/ailang/internal/builtins)
✅ String comparison works correctly
✅ Option pattern matching works correctly
✅ No regressions in related functionality

## Verification
Tested with:
- String comparison: `substring(s, 0, length(prefix)) == prefix` ✓
- Option pattern matching: `match Some(x) { Some(h) => ..., None => ... }` ✓

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 4, 2026
Round-1 sprint evaluation flagged three items (1 medium, 2 low). All three
addressed in this follow-up commit; no new design-doc deviations.

#1 (medium): Snapshot test for streaming-vs-non-streaming AI span shape
- New cmd/ailang/configdriven_streaming_span_snapshot_test.go (197 LOC)
- TestStreamingAISpan_SameShapeAsNonStreaming asserts that ctx.RecordAIEffect
  produces the same TraceEvent shape for "call" and "streamCall" — modulo
  OpName ("call" vs "streamCall") and Args content (1 vs 3 strings). Locks
  in the design-doc A4/A9 contract: streaming AI cannot silently degrade
  observability vs non-streaming AI.
- TestStreamingAISpan_RecordedFromAIStreamCallEndToEnd verifies the real
  aiStreamCall function reaches the recording call when invoked end-to-end
  against a mock SSE server. Belt-and-suspenders confirmation.

#2 (low): CapabilityNotSupported error code wiring
- Provider-registry misses (cmd/ailang/configdriven_streaming.go) now
  return ProtocolError("[ProviderNotFound] ...") rather than constructing
  a fake "ProviderNotFound" StreamErrorKind variant that wasn't in the
  declared ADT. Streaming-disabled / capabilities-streaming-false misses
  in BuildStreamRequest now carry "[CapabilityNotSupported]" prefix.
- Pattern: real StreamErrorKind variant + structured "[code]" prefix in
  the message string. Callers can pattern-match on ProtocolError AND
  switch on the [code] tag if needed. Documented inline.
- Tests updated to assert on (ProtocolError, [code] prefix) instead of
  fake-variant constructor names.

#3 (low): Recipe page pseudocode → concrete v1 snippet
- docs/docs/recipes/ai-token-streaming.md replaces the "pseudocode (v1.1
  will expose this via parseDelta)" block with a working v1 extractDelta
  template using std/json.decode and std/json.getString. Honest about the
  v1 limitation that std/json doesn't yet ship a path-walker — code shows
  the structural pattern callers should follow until v1.1's parseDelta.

All 6 packages still green: internal/pkg, internal/ai, internal/ai/configdriven,
internal/effects, internal/builtins, cmd/ailang. Full make test passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
…NGELOG

In-repo Pillar 2 work:
  - docker/Dockerfile.agent-motoko: clones sunholo-data/motoko_agent at
    pinned commit 84fa449, installs bun + motoko-ext-* packages,
    symlinks scripts/run-agent.sh to /usr/local/bin/motoko. Mirrors
    Dockerfile.agent-pi (CLI-only, no Go toolchain).
  - internal/dispatch/cloudrun/dispatcher.go: knownVariants["motoko"]=true.
  - docker/agent-motoko-multivac-prs.md: step-by-step checklist for the
    two ailang-multivac PRs (cloudbuild + cloudbuild-images sync per
    EXECUTOR_SHAPE §6 drift warning; agent_executor_motoko Cloud Run
    Job with cost-controlled secret bindings — OPENROUTER + OPENAI +
    GEMINI only, NO ANTHROPIC per pi precedent).

Cross-repo work (NOT in this commit, requires ailang-multivac access):
  - PR #1 to ailang-multivac: cloudbuild.yaml + cloudbuild-images.yaml
    add build-agent-motoko + push-agent-motoko steps (in BOTH files).
  - PR #2 to ailang-multivac: terraform/cloud_run_jobs.tf adds
    agent_executor_motoko block with VPC connector + cost-controlled
    env bindings. Smoke test: terraform apply to ailang-multivac-dev,
    coordinator dispatch with --executor motoko.

M5 (threshold measurement) is queued — requires either the cloud Job
above or a local run with OPENROUTER_API_KEY budget. The eval-suite
command is documented in the CHANGELOG entry; numbers will be appended
under a follow-up entry once data exists.

Tests: full go test ./... green; whole-tree builds clean.

Closes M4 of M-MOTOKO-EXECUTOR-ADAPTER (in-repo portion). M5 deferred
to follow-up after cloud Job lands or local run executes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
… 10 integration gaps

Today's live smoke testing of v0.18.0's M-MOTOKO-EXECUTOR-ADAPTER
surfaced 10 interconnected gaps that prevent trustworthy benchmark
numbers. Three got partial fixes during the day (HealthCheck no-spawn,
MOTOKO_REPO fallback, MOTOKO_HEADLESS, run_summary-before-done reorder)
but root causes remain across both repos. User feedback: "we need it
all I think. lets get to the bottom of the gaps - I think a design
doc process will help."

This sprint sequences the fixes properly:

  Phase 1: Investigation-first for gap #1 (run_summary not reaching
    disk on success path) — debug:checkpoint markers + bisect.
    Non-negotiable; writing a fix without the cause is gambling.

  Phase 2: motoko-side fixes (gap #1 root-cause fix + #6 extension
    visibility + #7 --headless flag + #8 --version mode + #10 TS
    process.exit removal so emission ordering doesn't matter)

  Phase 3: AILANG-side fixes (gap #2 success-criteria fallback to
    thinking.finish_reason + #5 MOTOKO_REPO discovery from wrapper)

  Phase 4: Cross-cutting (gap #4 session_id unification — adapter
    canonical, TS wrapper honors, AILANG runtime emits matching)

  Phase 5: Config layer (gap #3 + #9 cost_rates source-of-truth in
    models.yml.pricing → env-var override of motoko's profile config)

  Phase 6: End-to-end validation — TestEndToEnd_FullResultPopulation
    asserts every Result field; M5 paired-comparison
    motoko-claude-haiku-4-5 vs claude-haiku-4-5 produces real numbers.

Architectural posture: eliminate fragile assumptions at every layer.
Today's adapter assumes things that aren't true (wrapper preserves
session_id, cost_rates configured, run_summary always reaches disk,
loaded_extensions field accurate). After this hardening, none of those
assumptions remain — each replaced with explicit observable contracts.

Net axiom score: +13 (no hard violations). Strong A2 (replayability —
captured runs are fully reproducible), A7 (machines first — Result
fields mechanically reliable), A9 (cost visibility — eliminates $0
reporting gap).

Estimated 3 working days, ~530 LOC including tests, across both repos.
GATING for M5 of v0.18.0 (threshold-measurement) and v0.19.0
M-MOTOKO-EXT-PER-TASK (which needs accurate session_ids + extension
visibility from this hardening).

Cross-references:
- v0.18.0 M-MOTOKO-EXECUTOR-ADAPTER Future Work updated to point at
  this hardening as the trustworthy-numbers prerequisite
- v0.19.0 M-MOTOKO-EXT-PER-TASK Dependencies updated to mark v0.18.1
  as BLOCKING (was just "after local validation")

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
…design docs

Phase 6 of v0.18.1 hardening sprint.

Moves both design docs from design_docs/planned/v0_18_1/ to
design_docs/implemented/v0_18_1/ and updates their status headers to
"Implemented (2026-05-08)" with cross-repo commit references.

Adds the v0.18.1 entry to changelogs/v0.10-current.md covering all
five phases:
  - Phase 1 (gap #1): JSONL drain race in TS layer
  - Phase 2 (gaps #6, #7, #8): extensions visibility, --headless, --version
  - Phase 3 (gaps #2, #5): success fallback, MOTOKO_REPO discovery
  - Phase 4 (gap #4): session_id unification
  - Phase 5 (gaps #3, #9): cost rates env-var passthrough

Acceptance gate: 5 of 7 conditions met; the remaining 2 (CostUSD>0
end-to-end + smoke success) blocked on a separate Bedrock validation
issue (extension tool names with `/` fail Anthropic's
^[a-zA-Z0-9_-]{1,128}$ pattern). The pricing env-var plumbing is
verified by unit tests; live smoke needs the extension fix downstream.

LOC tally: ~80 AILANG-side + ~250 motoko-side + 11 new tests across
both repos, in ~6 hours wall-clock vs the 3-day plan estimate.

Sprint retrospective: investigation-first paid off — the 12 debug:
checkpoint markers in Phase 1 directly identified the silent-exit
point as the TS process.exit-on-done race, which would have been
maddening to find by code-reading alone. The resulting fix was tiny
(~25 LOC across 2 TS files) but unblocked everything downstream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
The first 3-harness paired comparison on `--agent-parallel 2` (run today
2026-05-08, after v0.18.1 shipped) revealed motoko has a parallel-execution
class of failures the serial-mode v0.18.1 hardening doesn't cover.

CONTEXT
=======
- v0.18.1 closed serial-mode gaps; serial smoke = 42/45 (93% — failures
  are benchmark-correctness misses, not infrastructure)
- v1 parallel (no fixes) = 40/45 (88.9%)
- v2 parallel (with EADDRINUSE retry/yield fix) = 37/45 (82.2%) — REGRESSED
- 4 of 5 motoko parallel failures: dur_s=0 + 0-byte JSONL ("motoko
  terminated without emitting run_summary") = crash BEFORE TS init

ROOT CAUSE (per cross-executor audit in design doc):
Motoko is the OUTLIER in the executor fleet. claude/gemini/codex/opencode/
pi all use `cmd.Dir = task.Workspace` + no shared filesystem state +
no embedded services. Motoko inherited a different design (long-lived
TUI with embedded env-server + cd-into-shared-MOTOKO_REPO) and the
v0.18.0 adapter wraps it without re-isolating.

SCOPE
=====
3 hypotheses to bisect in Phase 1 (investigation-first per the v0.18.1
gap #1 pattern that paid off):
  H1: Cache-write race (.ailang/cache/compile/.../core.gob clobber)
  H2: Per-task env-server isolation gap (EADDRINUSE handler routes to
      sibling's env-server bound to sibling's workdir)
  H3: Shared registry state (MOTOKO_REPO/src/core/ext/registry_generated)

PROPOSED FIX (3 coordinated layers, mirrors M-SERVE-API-CONCURRENCY's
per-request-isolation playbook):
  1. Per-task MOTOKO_HOME (hardlink-mirror of MOTOKO_REPO per spawn)
  2. Single env-server per session (drop inline OR drop auto_start)
  3. Cache pre-warming opt-in via HealthCheck

ACCEPTANCE GATE
===============
5 consecutive runs of 15-benchmark smoke tier × motoko-claude-haiku-4-5
× --agent-parallel 4 see ≥95% success rate over 60 runs (≤3 failures,
all benchmark-correctness misses NOT infrastructure failures).

LOC + Time
==========
~250 LOC across both repos, 2 days estimate. Follows v0.18.1's pattern
(actual was ~330 LOC + 11 tests in ~6h vs 3-day estimate — let's see
if the per-task isolation reuse from M-SERVE-API-CONCURRENCY accelerates).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MarkEdmondson1234 pushed a commit that referenced this pull request May 19, 2026
…ows CI fixes

Addresses the two low-severity follow-up items from the round-1
sprint-evaluator verdict (PASS @ 91/100) plus Windows CI test flakes
the user surfaced.

cmd/wasm/effects.go (266 LOC removed) + effects_cognition.go (290 LOC, new):
- Extract WasmDOMHandler + WasmMsgHandler + setDOM*/setMsg* + getOrCreate*
  + domPatchToJS into a dedicated file so each module stays under the
  800-line AI-maintainability threshold
- cmd/wasm/effects.go drops from 918 → 652 LOC (back under threshold)
- effects_cognition.go is build-tagged js && wasm same as the original
- Shared helpers (awaitJSResult, jsGetString, jsGetInt, replInstance)
  continue to live in effects.go / effects_helpers.go — same package,
  so the split is purely organizational

docs/docs/guides/wasm-integration.md (+108 LOC):
- New "Cognitive OS Substrate (v0.21.x)" section covering: shipped
  effects (DOM/Msg/Trace), step-pattern interface, cognitive event
  log + replay determinism claim, JS API for the bridges, runnable
  example pointer, end-to-end status table separating shipped vs
  deferred items across M-COG-RUNTIME / M-COG-RUNTIME-BROWSER /
  M-COG-MEMORY / M-COG-MESH
- The sprint plan named docs/docs/guides/wasm-runtime.md as the
  target; the actual existing guide is wasm-integration.md, so the
  section is added there

Windows CI test fixes (two flakes the user surfaced):

cmd/ailang/main_run_pipe_test.go (+8 LOC):
- TestRunCommand_PipedStdoutFlushesPerLine was failing on
  windows-latest with "EVENT_1 arrived at 1.6967s — too late". The
  load-bearing gap assertion (EVENT_1 → EVENT_2 ≥ 200ms) passed; only
  the belt-and-suspenders absolute-time check failed because the
  ailang binary cold-start cost on Windows runner VMs is ~1.7s vs
  <0.5s on Linux/macOS
- Fix: scale the upper bound to 3.5s on Windows via runtime.GOOS
- The gap check remains the load-bearing assertion at 200ms

internal/lsp/diagnostics_test.go (+19 / -6 LOC):
- TestDidSaveRepublishes was failing on windows-latest with
  "no diagnostics arrived after didSave" (5s timeout). LSP pipeline
  latency on Windows runners exceeds the 5s budget that works locally
- Fix: new diagWaitTimeout() helper returns 15s on Windows, 5s
  elsewhere; all four sink.wait(docURI, 5*time.Second) sites updated
- Server lifecycle context bumped to 3× the diag wait so the parent
  context doesn't expire while a wait is still in flight on Windows

Both tests pass locally (Linux/macOS) post-change. The Windows budgets
preserve test intent (verify streaming / verify republish) without
turning either test into a no-op.

Refs:
  .ailang/state/evaluations/eval_M-COG-RUNTIME_round_1.json (feedback items #1, #2)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 23, 2026
M-AILANG-ERROR-QUALITY iter 3 (compiler error-msg #1): the type-checker
was leaking Go internal type names like `*types.TList` to users (and to
LLM eval agents) in unification error messages. The agent sees these and
has no idea what they mean — `*types.TList` was never in any AILANG doc.

Replaces 5 occurrences of `%T` (Go-internal type sigil) with `.String()`
(the canonical AILANG-level type printer that produces e.g. `[string]`
or `(int) -> bool`):

- cannot unify function type with X
- cannot unify list type with X (2x: TCon fallback + general)
- cannot unify array type with X
- cannot unify map type with X
- cannot unify tuple type with X
- cannot unify type application with X

Now also includes BOTH sides of the unification (t1 and t2) so the error
shows the full mismatch, not just the right-hand side.

Example improvement (the exact balanced_parens failure from Iter 1/2):
  Before: type unification failed at [list pattern]: cannot unify
          function type with *types.TList
  After:  type unification failed at [list pattern]: cannot unify
          function type with [string]

The "function type" + "[string]" tells the agent: "you wrote what AILANG
parsed as a function, but the context expected a list of strings". That's
actionable; *types.TList was not.

Doesn't fix the "add a 'did you mean [head,...tail]' suggestion" gap from
the design doc — that needs path-aware logic in inference_helpers.go that
detects list-pattern context and adds a hint. Deferring that to iter 4
if iter 3 alone doesn't recover balanced_parens.

Build + full make ci pass (117s). Three further %T cases remain in
unification_records.go which the eval data hasn't flagged as a problem
yet — will revisit if record-pattern errors surface in later rotations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
… dispatch

Three concrete gaps prevent `ailang messages send eval-rig "task"
--requires agent:motoko` from working end-to-end after
M-COORD-MULTI-HOST-WORKERS v0.22.0 shipped the routing primitives:

1. Local daemon HTTP listener off by default (PORT env not in launchd plist)
2. `ailang messages send` CLI missing `--requires` flag
3. No cloud motoko fallback (Dockerfile exists, but no cloudbuild step
   and no Cloud Run Job)

Targets v0.23.0, estimated 1-2 days. Direct follow-on to
M-COORD-MULTI-HOST-WORKERS — item #1 in its Future Work section
("Cloud-fallback routing").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
Three small additions enable the daemon's HTTP listener on local-mode
installs:

1. plist template gains PORT env var with new @HTTP_PORT@ token.
   Comment explains that without PORT, /api/messages and /health are
   unreachable and tag-routed sends fail silently.

2. install_coordinator.sh accepts --port N (default 8765, validated as
   unprivileged 1024-65535), AILANG_COORD_HTTP_PORT env override, and a
   final-line `curl http://127.0.0.1:$HTTP_PORT/health` reminder.

3. coordinator_lifecycle.go::printCoordinatorStatusOutput probes the
   listener and prints "HTTP: ✓ http://127.0.0.1:8765" or a clear "no PORT
   configured" hint pointing at make coord-install. discoverCoordinatorHTTPPort
   reads AILANG_COORD_HTTP_PORT → PORT env → plist (single regexp; pulling
   in a plist parser for one key would be overkill). probeCoordinatorHTTP
   uses a 500ms timeout so the status command stays fast on misconfigured
   hosts.

Verified live on this Studio: reinstalled the plist with --port 8765,
daemon bound the listener, /health returned 200, status command printed
the new line. The pre-existing v0.24.0 comment headers on the plist +
installer were cleaned up to reflect v0.22.0 (M-COORD-MULTI-HOST-WORKERS)
+ v0.23.0 (this sprint) — leftover from the v0.22 relabel commit that
didn't touch these files.

Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (cloud-fallback
routing needs M3 to land too, but M1+M2 are the local-side prereqs).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
…s send`

Closes the v0.22.0 CHANGELOG-acknowledged gap. The flag accepts
comma-separated worker tags (`--requires agent:motoko,ollama:gemma4`) and,
when present, routes the message through the local daemon's HTTP
/api/messages endpoint instead of the SQLite-only path. The daemon
attaches the tags as Pub/Sub attributes so worker subscriptions can do
tag-subset filtering per M-COORD-MULTI-HOST-WORKERS v0.22.0.

The HTTP path reuses M1's `discoverCoordinatorHTTPPort` + `probeCoordinatorHTTP`
helpers (env → plist), so `--requires` automatically works on any host
whose launchd plist was installed by the M1-updated install_coordinator.sh.
If the daemon HTTP listener isn't reachable, the error is actionable
(suggests `make coord-install` + the launchctl bootstrap command),
not silent.

Without `--requires`, behavior is unchanged from v0.22.0 — the SQLite
path stays the default for fire-and-forget local queueing.

The previous v0.22.0 comment block at messages_send.go:40 explaining
"intentionally NOT extended with --requires" was replaced with the new
behavior doc.

Coverage:
- TestSplitAndTrim: 8 cases for the comma-separated parser
  (single/multi/whitespace/empty/trailing-comma/all-empty)
- TestSendViaHTTP_PostsCorrectShape: verifies POST body matches the
  postMessageRequest fields in daemon_http.go (inbox/title/content/from/
  category/requires)
- TestSendViaHTTP_HonorsAPIKey: COORDINATOR_API_KEY env → Bearer header
- TestSendViaHTTP_ErrorWhenUnreachable: clear "no PORT" error path with
  next-step hint

All tests pass deterministic on -count=20.

Live verified on this Studio: `ailang messages send eval-rig 'M2 smoke'
--requires agent:motoko --from sprint-executor` → message landed in
SQLite via the HTTP endpoint, daemon logs show the POST.

Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (local-CLI side
closed; cloud-fallback Job is M3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
…oss-repo PR checklist v0.23.0 refresh

In-repo changes (the only M3 work that ships in this commit — the rest
lives in ailang-multivac):

1. cloudbuild-dev.yaml gains `build-agent-motoko` step mirroring
   `build-agent-go` (registry-cached buildx, FROMs agent-base via
   Dockerfile.agent-motoko's existing FROM). Push happens via
   `--push` flag like the other agent-* builders. `deploy-services`
   waitFor now includes build-agent-motoko so the deploy step doesn't
   race ahead of the image being available.

2. docker/agent-motoko-multivac-prs.md refreshed for v0.23.0 scope:
   - NEW: PR #0 (operational) — cloud `ailang-coordinator` is on a
     2026-04-28 image (pre-v0.21.0); MUST redeploy before E2E can
     exercise the v0.22.0 `requires` field
   - PR #2 addendum — coordinator agent config (config.yaml in the
     mounted ConfigMap) needs `motoko` agent entry with `worker_tags:
     [agent:motoko]` so M-COORD-MULTI-HOST-WORKERS tag matcher
     recognises the cloud Job as a valid dispatch target
   - PR #2 Job spec gets `max_retries = 1` (motoko is non-idempotent
     in cost — one retry max)
   - PR #3 (NEW, deferred) — `ailang-openrouter-api-key` prod secret
     resource. Currently only ailang-multivac-DEV has the secret;
     prod motoko cloud-dispatch is gated on cost analysis from dev
     throughput. Per-Job $0.30 cap on `motoko-or-gemma-4-26b` bounds
     the blast radius.
   - End-to-end smoke command updated to use the new --requires CLI
     flag from M2 (closes the v0.22.0 CLI gap that necessitated
     curl POST workarounds)

Acceptance gate refresh: 5 items, including the PR #0 pre-flight
("coordinator image timestamp shows post-v0.22.0 deploy").

What's NOT in this commit (intentional — cross-repo):
- The ailang-multivac terraform/cloud_run_jobs.tf addition (PR #2 body)
- The mounted coordinator config update (PR #2 addendum body)
- The prod secret resource (PR #3, deferred)
- The ailang-multivac cloudbuild.yaml + cloudbuild-images.yaml updates (PR #1)

Lints clean. cloudbuild-dev.yaml YAML validates (10 steps, build-agent-motoko
inserted between build-agent-go and push-coordinator).

Refs: M-COORD-MULTI-HOST-WORKERS Future Work item #1 (cloud-fallback
routing) — the local-side closures landed in M1/M2; this completes the
in-repo half of the cross-repo cloud-side work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
…trix

Three docs updates:

1. changelogs/v0.10-current.md: comprehensive sprint entry covering
   M1 (launchd PORT + status probe), M2 (--requires CLI flag),
   M3 (in-repo half of cloud-fallback: cloudbuild step + cross-repo
   PR checklist). Explicit verification matrix shows what works locally
   versus what's gated on the cross-repo / cross-deploy PRs:
     - Scenario 1 (Studio→Studio): partial — HTTP send path verified
       (M2 live smoke), but local dispatcher's requires-aware executor
       selection is a follow-up
     - Scenario 2 (laptop→cloud→Studio): deferred — gated on PR #0
       (cloud coordinator redeploy from April-28 image)
     - Scenario 3 (cloud-fallback Job): deferred — gated on PRs #1+#2
       in ailang-multivac

2. docs/docs/guides/coordinator-workers.md: refreshed Example 2 with
   the new `--requires` CLI invocation (replaces hand-rolled curl);
   added "HTTP endpoint configuration" subsection (default port 8765,
   override via env or --port flag, /health probe, route catalog with
   per-route auth requirements + warning about exposing :8765 without
   COORDINATOR_API_KEY).

3. docs/docs/guides/agent-messaging.md: new "Tag-routed sends (v0.23.0+)"
   subsection with concrete --requires examples (single tag, multi-tag
   intersection) + prerequisites callout (HTTP listener up, worker
   advertising the tag set).

Honest accounting: the local-side surface (M1+M2) is feature-complete
and ready for use today. The cloud-side dispatch path (M3.x in
ailang-multivac repo) is documented but not in production. The
sprint plan called this out as expected — the in-repo half is what
ships in v0.23.0; the cross-repo PRs are tracked separately.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 27, 2026
…ed/v0_23_0/

Sprint complete: 4/4 milestones pass.

In-repo shipped:
- M1: launchd PORT env + status probe (commit 49664aa, 86 LOC)
- M2: --requires CLI flag + 4 tests (commit 9544139, 274 LOC)
- M3: cloudbuild build-agent-motoko + cross-repo PR checklist (commit
  e4df2f4, 135 LOC)
- M4: docs + CHANGELOG + verification matrix (commit 012cf39, 101 LOC)

Total: 596 LOC actual vs 305 estimated (overshot — docs heavier than
the design doc accounted for, and the cross-repo PR checklist refresh
in M3 was richer than a thin update).

Verification matrix (honest):
- Scenario 1 (Studio→Studio): partial — HTTP send verified live; local
  dispatcher's requires-aware executor selection is a follow-up
- Scenario 2 + 3: deferred on the cross-repo PRs documented in
  docker/agent-motoko-multivac-prs.md (PR #0/#1/#2 in ailang-multivac
  repo, plus the operational cloud coordinator redeploy)

The local-side surface (M1+M2) is feature-complete and ships in v0.23.0.

Next: hand off to sprint-evaluator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request Jun 3, 2026
…ation misdiagnosis

KEY FINDING: investigating the "token truncation" theory revealed it was wrong.
The failing run_length_encode/type_unify/red_black_tree outputs were only 161-442
tokens (8192 limit) — NOT harness-truncated. The real cause: `++` used for string
concatenation (type error since v0.13.0), after which the parser bails producing an
EOF-looking error downstream.

`++` for strings appears in 46% of ALL compile failures (1374/2948) — by far the
single largest AILANG compile-failure cause across every model tier. And it is
ALREADY in the teaching prompt (3 places) — so this is a SALIENCE problem, not a
coverage gap. The trained `++` reflex (Haskell/Elm/PureScript) overrides a buried
table row.

- NEW m-prompt-string-concat-plusplus (P0): salience redesign — top-of-prompt
  hard-rules box + targeted type-error fix-it suggesting "${...}". Projected
  +8-12pt CPR, dwarfing all other prompt fixes combined.
- m-prompt-concise-recursive-solutions: CORRECTED — demoted P2→P3, root cause
  note added pointing at the ++ doc. The truncation theory was a misdiagnosis.
- m-prompt-single-file-module: completed (multi_module_imports, 4/4 compile fail).

The eval harness is NOT over-restricting output length (the user's question) —
8192 tokens is plenty; failures stop at <450 tokens due to genuine syntax errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request Jun 5, 2026
CompareOutput did exact string match after trimming outer whitespace, so correct
JSON output failed on formatting: AILANG's std/json encode emits compact `{"a":1}`
while benchmarks (and Python's json.dumps default) expect spaced `{"a": 1}`. The
v0.24.1 analysis found 9/10 whitespace-only AILANG "logic_error" failures were
correct JSON failing byte-exact match — all of ast_patch_roundtrip (the #1
AILANG-vs-Python gap, which looked "genuinely hard" at 38% but was a grader artifact).

Fix: if BOTH expected and actual parse as valid JSON, compare canonical parsed forms
(reflect.DeepEqual on json.Unmarshal). This also handles int-vs-float (all JSON numbers
→ float64) and key order. SAFE: only triggers when both sides are valid JSON, so
non-JSON near-misses ("1 2" vs "12") and formatted-text benchmarks are unaffected —
exact match remains the fast path and genuinely-wrong JSON still fails.

Verified against real v0.24.1 data: 9 false failures resolved (ast_patch_roundtrip
38%→~95%). 13 CompareOutput unit tests incl. 2 safety cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request Jun 10, 2026
Frequency analysis of 334 local-qwen agent trials (44 failures) shows ~36% are
a single family — expression-body (= expr) vs block-body ({ stmts }) /
statement-separator confusion — dominated (20.5%) by the
`func f() = let x = e; rest` reflex (PAR017: ';' not valid in expression-body
functions). match...with (PAR019) and ++-for-string-concat — the old big-model
top failures — are now rare/zero on qwen, so the card already works for those;
the small-model frequency banners undercount what's still live.

- Sharpen dialect-traps card trap #2 to name the exact `= let x = e; rest`
  anti-pattern + both fixes (brace block, or let-in). Verified: anti-pattern
  rejects (PAR017), both fixes run.
- Record the local-qwen frequency data in m-ailang-error-quality-for-llm-iteration
  (re-prioritizes it): parser/card already cover PAR017 yet the model fails it and
  can't recover (config_file_parser thrashed 66 turns) — the lever is making PAR017
  recovery-actionable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request Jun 10, 2026
…tatements

The #1 unactionable small-model failure in agent mode is the mirror of PAR017:
a *missing* ';' between block statements. A model writes a { } block body and
drops the separator (`pure func f() -> int { let n = length(s)  if n > 0 ... }`),
and the parser emitted a bare "PAR_UNEXPECTED_TOKEN: expected }, got if" with
zero recovery signal — config_file_parser burned 66 agent turns on exactly this.

The parser now emits PAR020 — "missing ';' between block statements (found `X`
where `;` or `}` was expected)" with the concrete two-line fix and a docs link —
when a block body (function-declaration path, parser_func.go) or block expression
(parser_expr.go) is followed by a statement-starting token (let/letrec/if/match/
identifier) instead of ';'/'}'. Shared via missingBlockSemicolonError() +
peekStartsBlockStatement(). PAR017 (extra ';') + PAR020 (missing ';') now bookend
the whole ';'-confusion family — ~32% of local-qwen agent failures.

Found via the M-AILANG-ERROR-QUALITY frequency analysis of 334 qwen trials.

- TestPAR020_MissingBlockSemicolon: fires on the pattern; no false-positive on
  valid or single-expression blocks.
- parser/elaborate/pipeline suites green; make verify-examples at baseline (181/5/2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex coordinator:in-progress Task claimed by a coordinator instance - prevents duplicate work needs-design-approval Awaiting human approval of design document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants