Skip to content

Review design docs in planned/ folder#9

Merged
MarkEdmondson1234 merged 1 commit into
devfrom
claude/review-design-docs-011CUMm9TBUxkAQkZ8r1Qo8M
Oct 22, 2025
Merged

Review design docs in planned/ folder#9
MarkEdmondson1234 merged 1 commit into
devfrom
claude/review-design-docs-011CUMm9TBUxkAQkZ8r1Qo8M

Conversation

@MarkEdmondson1234

Copy link
Copy Markdown
Member

Comprehensive analysis of 15 design docs:

  • 3 should move to implemented/ (M-DX1, M-EVAL round-robin complete)
  • 8 should stay in planned/ (active future work for v0.4.0)
  • 4 should be archived (analysis docs, outdated plans, fixed bugs)

Key findings:

  • IO output bug (v0.3.6) is FIXED - verified in v0.3.17
  • M-DX1 is 90% complete (all 52 builtins migrated & documented)
  • Most planned docs are valid v0.4.0 roadmap items

Includes bash commands ready to execute for reorganization.

🤖 Generated with Claude Code

Comprehensive analysis of 15 design docs:
- 3 should move to implemented/ (M-DX1, M-EVAL round-robin complete)
- 8 should stay in planned/ (active future work for v0.4.0)
- 4 should be archived (analysis docs, outdated plans, fixed bugs)

Key findings:
- IO output bug (v0.3.6) is FIXED - verified in v0.3.17
- M-DX1 is 90% complete (all 52 builtins migrated & documented)
- Most planned docs are valid v0.4.0 roadmap items

Includes bash commands ready to execute for reorganization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@MarkEdmondson1234 MarkEdmondson1234 merged commit ba863c4 into dev Oct 22, 2025
8 of 9 checks passed
@MarkEdmondson1234 MarkEdmondson1234 deleted the claude/review-design-docs-011CUMm9TBUxkAQkZ8r1Qo8M branch December 30, 2025 18:35
sunholo-voight-kampff added a commit that referenced this pull request Mar 13, 2026
Root cause: lookupPrefix() iterated Go map nondeterministically when
duplicate namespace prefixes mapped to same URI (common in EPUB/OOXML).
Fix: check default namespace first before map iteration.

Performance: String() methods on ListValue, ArrayValue, TupleValue,
RecordValue, TaggedValue used += concatenation (O(n²)). Switched to
strings.Builder. Pre-allocated slices in evalCoreList/Array/Tuple and
XML attribute parsing. Zero-allocation whitespace check for CharData.

Result: Moby Dick EPUB parse 62s → 11.5s (5.4x speedup).

Process: added determinism verification as sprint-executor principle #9
and builtin-developer validation rule #7.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
… 10 integration gaps

Today's live smoke testing of v0.18.0's M-MOTOKO-EXECUTOR-ADAPTER
surfaced 10 interconnected gaps that prevent trustworthy benchmark
numbers. Three got partial fixes during the day (HealthCheck no-spawn,
MOTOKO_REPO fallback, MOTOKO_HEADLESS, run_summary-before-done reorder)
but root causes remain across both repos. User feedback: "we need it
all I think. lets get to the bottom of the gaps - I think a design
doc process will help."

This sprint sequences the fixes properly:

  Phase 1: Investigation-first for gap #1 (run_summary not reaching
    disk on success path) — debug:checkpoint markers + bisect.
    Non-negotiable; writing a fix without the cause is gambling.

  Phase 2: motoko-side fixes (gap #1 root-cause fix + #6 extension
    visibility + #7 --headless flag + #8 --version mode + #10 TS
    process.exit removal so emission ordering doesn't matter)

  Phase 3: AILANG-side fixes (gap #2 success-criteria fallback to
    thinking.finish_reason + #5 MOTOKO_REPO discovery from wrapper)

  Phase 4: Cross-cutting (gap #4 session_id unification — adapter
    canonical, TS wrapper honors, AILANG runtime emits matching)

  Phase 5: Config layer (gap #3 + #9 cost_rates source-of-truth in
    models.yml.pricing → env-var override of motoko's profile config)

  Phase 6: End-to-end validation — TestEndToEnd_FullResultPopulation
    asserts every Result field; M5 paired-comparison
    motoko-claude-haiku-4-5 vs claude-haiku-4-5 produces real numbers.

Architectural posture: eliminate fragile assumptions at every layer.
Today's adapter assumes things that aren't true (wrapper preserves
session_id, cost_rates configured, run_summary always reaches disk,
loaded_extensions field accurate). After this hardening, none of those
assumptions remain — each replaced with explicit observable contracts.

Net axiom score: +13 (no hard violations). Strong A2 (replayability —
captured runs are fully reproducible), A7 (machines first — Result
fields mechanically reliable), A9 (cost visibility — eliminates $0
reporting gap).

Estimated 3 working days, ~530 LOC including tests, across both repos.
GATING for M5 of v0.18.0 (threshold-measurement) and v0.19.0
M-MOTOKO-EXT-PER-TASK (which needs accurate session_ids + extension
visibility from this hardening).

Cross-references:
- v0.18.0 M-MOTOKO-EXECUTOR-ADAPTER Future Work updated to point at
  this hardening as the trustworthy-numbers prerequisite
- v0.19.0 M-MOTOKO-EXT-PER-TASK Dependencies updated to mark v0.18.1
  as BLOCKING (was just "after local validation")

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
…5a-c, gaps #3 #9)

Phase 5 of v0.18.1 hardening sprint. Closes gap #3 (CostUSD always 0
for motoko runs) and gap #9 (per-profile cost_rates duplication burden).

PRE-FIX STATE
=============
motoko's profile config required cost_rates for EVERY model used. Adding
a new model meant: (a) edit AILANG models.yml, (b) edit motoko's profile
config separately, (c) keep them in sync forever. The dogfood profile
shipped with no cost_rates for openrouter/anthropic models, so:

  - cost_warning + cost_exhausted events never fired
  - run_summary.total_cost_usd = 0
  - adapter Result.CostUSD = $0.000000 across every motoko-* model
  - eval-harness threshold-measurement experiments could not see cost

ARCHITECTURE
============
Single source of truth: AILANG models.yml. Per-task pricing flows:

  AILANG models.yml (input_per_1k, output_per_1k)
    → eval-harness constructs CostBudget(InputPer1K, OutputPer1K)
    → motoko adapter converts to per-1M millicents
    → MOTOKO_COST_INPUT_PER_1M_MILLICENTS / OUTPUT env vars
    → motoko load_cost_rates reads env first, profile config as fallback

The conversion: per_1k_usd × 1e8 = per_1m_millicents
  (×1000 for K→M, ×100 for $→¢, ×1000 for ¢→m¢)

motoko profiles can keep their cost_rates blocks for non-eval interactive
use; the env vars override per-task only when set.

M5a — Adapter env-var emit
==========================
internal/executor/motoko/motoko.go: when task.Budget != nil and rates
are >0, append both MOTOKO_COST_*_PER_1M_MILLICENTS env vars before the
spawn. Conversion is straight float×int — no rounding loss for typical
rates (a few cents per million tokens).

M5b — motoko-side env-var override
==================================
src/core/config.ail (motoko): new env_int helper + load_cost_rates now
declares ! {Env}. Reads MOTOKO_COST_INPUT_PER_1M_MILLICENTS first;
falls back to profile config json_int otherwise.

The ! {Env} effect change propagates one level (load_runtime_config in
config.ail already had ! {Env} from active_profile, so no caller-side
churn).

M5c — Tests
===========
Two new tests in internal/executor/motoko/execute_test.go:

  - TestExecute_BudgetEnvVarPassthrough: mock motoko writes received env
    vars to a side-channel file; test asserts both rates land at the
    expected per-1M millicent values for haiku-4-5 (25000/125000).

  - TestExecute_NoBudget_NoEnvVar: when task.Budget is nil (coordinator-
    driven runs, smoke runs that don't enforce budgets), the adapter
    must NOT emit the env vars. Defends the back-compat path.

M5d — Smoke wiring
==================
cmd/smoke-motoko/main.go: hardcode haiku-4-5 rates in Budget so the
smoke runner exercises the full pricing path. The eval harness derives
this from models.yml automatically.

KNOWN LIMITATION
================
End-to-end smoke verification is BLOCKED on a separate Bedrock validation
error: extensions register tools with names like "omnigraph/scaffold"
which fail Anthropic's ^[a-zA-Z0-9_-]{1,128}$ pattern. The cost path
WILL fire once that's fixed (handled in a separate ext-naming sprint).
The unit tests above prove the env-var plumbing is correct; the live
smoke needs the extension fix downstream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sunholo-voight-kampff added a commit that referenced this pull request May 8, 2026
…design docs

Phase 6 of v0.18.1 hardening sprint.

Moves both design docs from design_docs/planned/v0_18_1/ to
design_docs/implemented/v0_18_1/ and updates their status headers to
"Implemented (2026-05-08)" with cross-repo commit references.

Adds the v0.18.1 entry to changelogs/v0.10-current.md covering all
five phases:
  - Phase 1 (gap #1): JSONL drain race in TS layer
  - Phase 2 (gaps #6, #7, #8): extensions visibility, --headless, --version
  - Phase 3 (gaps #2, #5): success fallback, MOTOKO_REPO discovery
  - Phase 4 (gap #4): session_id unification
  - Phase 5 (gaps #3, #9): cost rates env-var passthrough

Acceptance gate: 5 of 7 conditions met; the remaining 2 (CostUSD>0
end-to-end + smoke success) blocked on a separate Bedrock validation
issue (extension tool names with `/` fail Anthropic's
^[a-zA-Z0-9_-]{1,128}$ pattern). The pricing env-var plumbing is
verified by unit tests; live smoke needs the extension fix downstream.

LOC tally: ~80 AILANG-side + ~250 motoko-side + 11 new tests across
both repos, in ~6 hours wall-clock vs the 3-day plan estimate.

Sprint retrospective: investigation-first paid off — the 12 debug:
checkpoint markers in Phase 1 directly identified the silent-exit
point as the TS process.exit-on-done race, which would have been
maddening to find by code-reading alone. The resulting fix was tiny
(~25 LOC across 2 TS files) but unblocked everything downstream.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants