Review design docs in planned/ folder by MarkEdmondson1234 · Pull Request #9 · sunholo-data/ailang

MarkEdmondson1234 · 2025-10-22T06:44:57Z

Comprehensive analysis of 15 design docs:

3 should move to implemented/ (M-DX1, M-EVAL round-robin complete)
8 should stay in planned/ (active future work for v0.4.0)
4 should be archived (analysis docs, outdated plans, fixed bugs)

Key findings:

IO output bug (v0.3.6) is FIXED - verified in v0.3.17
M-DX1 is 90% complete (all 52 builtins migrated & documented)
Most planned docs are valid v0.4.0 roadmap items

Includes bash commands ready to execute for reorganization.

🤖 Generated with Claude Code

Comprehensive analysis of 15 design docs: - 3 should move to implemented/ (M-DX1, M-EVAL round-robin complete) - 8 should stay in planned/ (active future work for v0.4.0) - 4 should be archived (analysis docs, outdated plans, fixed bugs) Key findings: - IO output bug (v0.3.6) is FIXED - verified in v0.3.17 - M-DX1 is 90% complete (all 52 builtins migrated & documented) - Most planned docs are valid v0.4.0 roadmap items Includes bash commands ready to execute for reorganization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Root cause: lookupPrefix() iterated Go map nondeterministically when duplicate namespace prefixes mapped to same URI (common in EPUB/OOXML). Fix: check default namespace first before map iteration. Performance: String() methods on ListValue, ArrayValue, TupleValue, RecordValue, TaggedValue used += concatenation (O(n²)). Switched to strings.Builder. Pre-allocated slices in evalCoreList/Array/Tuple and XML attribute parsing. Zero-allocation whitespace check for CharData. Result: Moby Dick EPUB parse 62s → 11.5s (5.4x speedup). Process: added determinism verification as sprint-executor principle #9 and builtin-developer validation rule #7. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… 10 integration gaps Today's live smoke testing of v0.18.0's M-MOTOKO-EXECUTOR-ADAPTER surfaced 10 interconnected gaps that prevent trustworthy benchmark numbers. Three got partial fixes during the day (HealthCheck no-spawn, MOTOKO_REPO fallback, MOTOKO_HEADLESS, run_summary-before-done reorder) but root causes remain across both repos. User feedback: "we need it all I think. lets get to the bottom of the gaps - I think a design doc process will help." This sprint sequences the fixes properly: Phase 1: Investigation-first for gap #1 (run_summary not reaching disk on success path) — debug:checkpoint markers + bisect. Non-negotiable; writing a fix without the cause is gambling. Phase 2: motoko-side fixes (gap #1 root-cause fix + #6 extension visibility + #7 --headless flag + #8 --version mode + #10 TS process.exit removal so emission ordering doesn't matter) Phase 3: AILANG-side fixes (gap #2 success-criteria fallback to thinking.finish_reason + #5 MOTOKO_REPO discovery from wrapper) Phase 4: Cross-cutting (gap #4 session_id unification — adapter canonical, TS wrapper honors, AILANG runtime emits matching) Phase 5: Config layer (gap #3 + #9 cost_rates source-of-truth in models.yml.pricing → env-var override of motoko's profile config) Phase 6: End-to-end validation — TestEndToEnd_FullResultPopulation asserts every Result field; M5 paired-comparison motoko-claude-haiku-4-5 vs claude-haiku-4-5 produces real numbers. Architectural posture: eliminate fragile assumptions at every layer. Today's adapter assumes things that aren't true (wrapper preserves session_id, cost_rates configured, run_summary always reaches disk, loaded_extensions field accurate). After this hardening, none of those assumptions remain — each replaced with explicit observable contracts. Net axiom score: +13 (no hard violations). Strong A2 (replayability — captured runs are fully reproducible), A7 (machines first — Result fields mechanically reliable), A9 (cost visibility — eliminates $0 reporting gap). Estimated 3 working days, ~530 LOC including tests, across both repos. GATING for M5 of v0.18.0 (threshold-measurement) and v0.19.0 M-MOTOKO-EXT-PER-TASK (which needs accurate session_ids + extension visibility from this hardening). Cross-references: - v0.18.0 M-MOTOKO-EXECUTOR-ADAPTER Future Work updated to point at this hardening as the trustworthy-numbers prerequisite - v0.19.0 M-MOTOKO-EXT-PER-TASK Dependencies updated to mark v0.18.1 as BLOCKING (was just "after local validation") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…5a-c, gaps #3 #9) Phase 5 of v0.18.1 hardening sprint. Closes gap #3 (CostUSD always 0 for motoko runs) and gap #9 (per-profile cost_rates duplication burden). PRE-FIX STATE ============= motoko's profile config required cost_rates for EVERY model used. Adding a new model meant: (a) edit AILANG models.yml, (b) edit motoko's profile config separately, (c) keep them in sync forever. The dogfood profile shipped with no cost_rates for openrouter/anthropic models, so: - cost_warning + cost_exhausted events never fired - run_summary.total_cost_usd = 0 - adapter Result.CostUSD = $0.000000 across every motoko-* model - eval-harness threshold-measurement experiments could not see cost ARCHITECTURE ============ Single source of truth: AILANG models.yml. Per-task pricing flows: AILANG models.yml (input_per_1k, output_per_1k) → eval-harness constructs CostBudget(InputPer1K, OutputPer1K) → motoko adapter converts to per-1M millicents → MOTOKO_COST_INPUT_PER_1M_MILLICENTS / OUTPUT env vars → motoko load_cost_rates reads env first, profile config as fallback The conversion: per_1k_usd × 1e8 = per_1m_millicents (×1000 for K→M, ×100 for $→¢, ×1000 for ¢→m¢) motoko profiles can keep their cost_rates blocks for non-eval interactive use; the env vars override per-task only when set. M5a — Adapter env-var emit ========================== internal/executor/motoko/motoko.go: when task.Budget != nil and rates are >0, append both MOTOKO_COST_*_PER_1M_MILLICENTS env vars before the spawn. Conversion is straight float×int — no rounding loss for typical rates (a few cents per million tokens). M5b — motoko-side env-var override ================================== src/core/config.ail (motoko): new env_int helper + load_cost_rates now declares ! {Env}. Reads MOTOKO_COST_INPUT_PER_1M_MILLICENTS first; falls back to profile config json_int otherwise. The ! {Env} effect change propagates one level (load_runtime_config in config.ail already had ! {Env} from active_profile, so no caller-side churn). M5c — Tests =========== Two new tests in internal/executor/motoko/execute_test.go: - TestExecute_BudgetEnvVarPassthrough: mock motoko writes received env vars to a side-channel file; test asserts both rates land at the expected per-1M millicent values for haiku-4-5 (25000/125000). - TestExecute_NoBudget_NoEnvVar: when task.Budget is nil (coordinator- driven runs, smoke runs that don't enforce budgets), the adapter must NOT emit the env vars. Defends the back-compat path. M5d — Smoke wiring ================== cmd/smoke-motoko/main.go: hardcode haiku-4-5 rates in Budget so the smoke runner exercises the full pricing path. The eval harness derives this from models.yml automatically. KNOWN LIMITATION ================ End-to-end smoke verification is BLOCKED on a separate Bedrock validation error: extensions register tools with names like "omnigraph/scaffold" which fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern. The cost path WILL fire once that's fixed (handled in a separate ext-naming sprint). The unit tests above prove the env-var plumbing is correct; the live smoke needs the extension fix downstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…design docs Phase 6 of v0.18.1 hardening sprint. Moves both design docs from design_docs/planned/v0_18_1/ to design_docs/implemented/v0_18_1/ and updates their status headers to "Implemented (2026-05-08)" with cross-repo commit references. Adds the v0.18.1 entry to changelogs/v0.10-current.md covering all five phases: - Phase 1 (gap #1): JSONL drain race in TS layer - Phase 2 (gaps #6, #7, #8): extensions visibility, --headless, --version - Phase 3 (gaps #2, #5): success fallback, MOTOKO_REPO discovery - Phase 4 (gap #4): session_id unification - Phase 5 (gaps #3, #9): cost rates env-var passthrough Acceptance gate: 5 of 7 conditions met; the remaining 2 (CostUSD>0 end-to-end + smoke success) blocked on a separate Bedrock validation issue (extension tool names with `/` fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern). The pricing env-var plumbing is verified by unit tests; live smoke needs the extension fix downstream. LOC tally: ~80 AILANG-side + ~250 motoko-side + 11 new tests across both repos, in ~6 hours wall-clock vs the 3-day plan estimate. Sprint retrospective: investigation-first paid off — the 12 debug: checkpoint markers in Phase 1 directly identified the silent-exit point as the TS process.exit-on-done race, which would have been maddening to find by code-reading alone. The resulting fix was tiny (~25 LOC across 2 TS files) but unblocked everything downstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MarkEdmondson1234 merged commit ba863c4 into dev Oct 22, 2025
8 of 9 checks passed

MarkEdmondson1234 deleted the claude/review-design-docs-011CUMm9TBUxkAQkZ8r1Qo8M branch December 30, 2025 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review design docs in planned/ folder#9

Review design docs in planned/ folder#9
MarkEdmondson1234 merged 1 commit into
devfrom
claude/review-design-docs-011CUMm9TBUxkAQkZ8r1Qo8M

MarkEdmondson1234 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MarkEdmondson1234 commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants