Review design docs in planned/ folder#9
Merged
MarkEdmondson1234 merged 1 commit intoOct 22, 2025
Conversation
Comprehensive analysis of 15 design docs: - 3 should move to implemented/ (M-DX1, M-EVAL round-robin complete) - 8 should stay in planned/ (active future work for v0.4.0) - 4 should be archived (analysis docs, outdated plans, fixed bugs) Key findings: - IO output bug (v0.3.6) is FIXED - verified in v0.3.17 - M-DX1 is 90% complete (all 52 builtins migrated & documented) - Most planned docs are valid v0.4.0 roadmap items Includes bash commands ready to execute for reorganization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
sunholo-voight-kampff
added a commit
that referenced
this pull request
Mar 13, 2026
Root cause: lookupPrefix() iterated Go map nondeterministically when duplicate namespace prefixes mapped to same URI (common in EPUB/OOXML). Fix: check default namespace first before map iteration. Performance: String() methods on ListValue, ArrayValue, TupleValue, RecordValue, TaggedValue used += concatenation (O(n²)). Switched to strings.Builder. Pre-allocated slices in evalCoreList/Array/Tuple and XML attribute parsing. Zero-allocation whitespace check for CharData. Result: Moby Dick EPUB parse 62s → 11.5s (5.4x speedup). Process: added determinism verification as sprint-executor principle #9 and builtin-developer validation rule #7. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sunholo-voight-kampff
added a commit
that referenced
this pull request
May 8, 2026
… 10 integration gaps Today's live smoke testing of v0.18.0's M-MOTOKO-EXECUTOR-ADAPTER surfaced 10 interconnected gaps that prevent trustworthy benchmark numbers. Three got partial fixes during the day (HealthCheck no-spawn, MOTOKO_REPO fallback, MOTOKO_HEADLESS, run_summary-before-done reorder) but root causes remain across both repos. User feedback: "we need it all I think. lets get to the bottom of the gaps - I think a design doc process will help." This sprint sequences the fixes properly: Phase 1: Investigation-first for gap #1 (run_summary not reaching disk on success path) — debug:checkpoint markers + bisect. Non-negotiable; writing a fix without the cause is gambling. Phase 2: motoko-side fixes (gap #1 root-cause fix + #6 extension visibility + #7 --headless flag + #8 --version mode + #10 TS process.exit removal so emission ordering doesn't matter) Phase 3: AILANG-side fixes (gap #2 success-criteria fallback to thinking.finish_reason + #5 MOTOKO_REPO discovery from wrapper) Phase 4: Cross-cutting (gap #4 session_id unification — adapter canonical, TS wrapper honors, AILANG runtime emits matching) Phase 5: Config layer (gap #3 + #9 cost_rates source-of-truth in models.yml.pricing → env-var override of motoko's profile config) Phase 6: End-to-end validation — TestEndToEnd_FullResultPopulation asserts every Result field; M5 paired-comparison motoko-claude-haiku-4-5 vs claude-haiku-4-5 produces real numbers. Architectural posture: eliminate fragile assumptions at every layer. Today's adapter assumes things that aren't true (wrapper preserves session_id, cost_rates configured, run_summary always reaches disk, loaded_extensions field accurate). After this hardening, none of those assumptions remain — each replaced with explicit observable contracts. Net axiom score: +13 (no hard violations). Strong A2 (replayability — captured runs are fully reproducible), A7 (machines first — Result fields mechanically reliable), A9 (cost visibility — eliminates $0 reporting gap). Estimated 3 working days, ~530 LOC including tests, across both repos. GATING for M5 of v0.18.0 (threshold-measurement) and v0.19.0 M-MOTOKO-EXT-PER-TASK (which needs accurate session_ids + extension visibility from this hardening). Cross-references: - v0.18.0 M-MOTOKO-EXECUTOR-ADAPTER Future Work updated to point at this hardening as the trustworthy-numbers prerequisite - v0.19.0 M-MOTOKO-EXT-PER-TASK Dependencies updated to mark v0.18.1 as BLOCKING (was just "after local validation") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff
added a commit
that referenced
this pull request
May 8, 2026
…5a-c, gaps #3 #9) Phase 5 of v0.18.1 hardening sprint. Closes gap #3 (CostUSD always 0 for motoko runs) and gap #9 (per-profile cost_rates duplication burden). PRE-FIX STATE ============= motoko's profile config required cost_rates for EVERY model used. Adding a new model meant: (a) edit AILANG models.yml, (b) edit motoko's profile config separately, (c) keep them in sync forever. The dogfood profile shipped with no cost_rates for openrouter/anthropic models, so: - cost_warning + cost_exhausted events never fired - run_summary.total_cost_usd = 0 - adapter Result.CostUSD = $0.000000 across every motoko-* model - eval-harness threshold-measurement experiments could not see cost ARCHITECTURE ============ Single source of truth: AILANG models.yml. Per-task pricing flows: AILANG models.yml (input_per_1k, output_per_1k) → eval-harness constructs CostBudget(InputPer1K, OutputPer1K) → motoko adapter converts to per-1M millicents → MOTOKO_COST_INPUT_PER_1M_MILLICENTS / OUTPUT env vars → motoko load_cost_rates reads env first, profile config as fallback The conversion: per_1k_usd × 1e8 = per_1m_millicents (×1000 for K→M, ×100 for $→¢, ×1000 for ¢→m¢) motoko profiles can keep their cost_rates blocks for non-eval interactive use; the env vars override per-task only when set. M5a — Adapter env-var emit ========================== internal/executor/motoko/motoko.go: when task.Budget != nil and rates are >0, append both MOTOKO_COST_*_PER_1M_MILLICENTS env vars before the spawn. Conversion is straight float×int — no rounding loss for typical rates (a few cents per million tokens). M5b — motoko-side env-var override ================================== src/core/config.ail (motoko): new env_int helper + load_cost_rates now declares ! {Env}. Reads MOTOKO_COST_INPUT_PER_1M_MILLICENTS first; falls back to profile config json_int otherwise. The ! {Env} effect change propagates one level (load_runtime_config in config.ail already had ! {Env} from active_profile, so no caller-side churn). M5c — Tests =========== Two new tests in internal/executor/motoko/execute_test.go: - TestExecute_BudgetEnvVarPassthrough: mock motoko writes received env vars to a side-channel file; test asserts both rates land at the expected per-1M millicent values for haiku-4-5 (25000/125000). - TestExecute_NoBudget_NoEnvVar: when task.Budget is nil (coordinator- driven runs, smoke runs that don't enforce budgets), the adapter must NOT emit the env vars. Defends the back-compat path. M5d — Smoke wiring ================== cmd/smoke-motoko/main.go: hardcode haiku-4-5 rates in Budget so the smoke runner exercises the full pricing path. The eval harness derives this from models.yml automatically. KNOWN LIMITATION ================ End-to-end smoke verification is BLOCKED on a separate Bedrock validation error: extensions register tools with names like "omnigraph/scaffold" which fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern. The cost path WILL fire once that's fixed (handled in a separate ext-naming sprint). The unit tests above prove the env-var plumbing is correct; the live smoke needs the extension fix downstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff
added a commit
that referenced
this pull request
May 8, 2026
…design docs Phase 6 of v0.18.1 hardening sprint. Moves both design docs from design_docs/planned/v0_18_1/ to design_docs/implemented/v0_18_1/ and updates their status headers to "Implemented (2026-05-08)" with cross-repo commit references. Adds the v0.18.1 entry to changelogs/v0.10-current.md covering all five phases: - Phase 1 (gap #1): JSONL drain race in TS layer - Phase 2 (gaps #6, #7, #8): extensions visibility, --headless, --version - Phase 3 (gaps #2, #5): success fallback, MOTOKO_REPO discovery - Phase 4 (gap #4): session_id unification - Phase 5 (gaps #3, #9): cost rates env-var passthrough Acceptance gate: 5 of 7 conditions met; the remaining 2 (CostUSD>0 end-to-end + smoke success) blocked on a separate Bedrock validation issue (extension tool names with `/` fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern). The pricing env-var plumbing is verified by unit tests; live smoke needs the extension fix downstream. LOC tally: ~80 AILANG-side + ~250 motoko-side + 11 new tests across both repos, in ~6 hours wall-clock vs the 3-day plan estimate. Sprint retrospective: investigation-first paid off — the 12 debug: checkpoint markers in Phase 1 directly identified the silent-exit point as the TS process.exit-on-done race, which would have been maddening to find by code-reading alone. The resulting fix was tiny (~25 LOC across 2 TS files) but unblocked everything downstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Comprehensive analysis of 15 design docs:
Key findings:
Includes bash commands ready to execute for reorganization.
🤖 Generated with Claude Code