Skip to content

Replace flat $0.00025/token cost estimate with model-aware pricing #308

Description

@spboyer

Problem

The dashboard's cost figures are computed in internal/webapi/store.go with a flat, model-agnostic rate:

func estimateCost(tokens int) float64 {
    // ~$0.00025 per token as a rough estimate
    return float64(tokens) * 0.00025
}

tokens is just InputTokens + OutputTokens summed from SessionDigest.Usage. This drives the run table's Cost column, the Avg Cost KPI, the trends chart, the compare view, and the CSV export.

What's wrong

  1. Model-agnostic — Opus, Sonnet, Haiku, and GPT‑5 all priced identically at ~$250/M blended. Real rates vary by 10×+ across models.
  2. No input/output split — output tokens cost 3–5× more than input on most providers.
  3. Ignores cache tokensCacheReadTokens (~10% of input price) and CacheWriteTokens (~125% of input price) are collected in UsageStats but excluded from the cost calc entirely.
  4. Ignores real cost data we already collect. The Copilot SDK reports mm.Requests.Cost per model in SessionShutdown, captured as ModelUsage.RequestCost in internal/execution/session_usage_collector.go:108 — but the dashboard never reads it. PremiumRequests is also unused for $ display.

Example (from current dashboard)

Run Tokens Flat estimate Real Opus 4.6 (rough)
3P Full Suite 41.7M $10,430 could be $3k–$15k depending on in/out/cache split

The number is in the right order of magnitude for Opus by accident; for Haiku it would be ~10× too high.

Proposed fix

  1. Replace estimateCost with a model-aware calculator that consumes per-model ModelUsage (input / output / cache‑read / cache‑write):
    • Per-model rate table (input, output, cache_read, cache_write per 1M tokens) for known models — Claude family, GPT family, Gemini.
    • Fall back to RequestCost from the SDK's ModelMetrics when present (treat SDK-reported cost as authoritative).
    • Fall back to a documented blended estimate only when neither is available, and surface that fact in the UI.
  2. Plumb ModelMetrics through to outcomeToSummary / RunDetail so the calculator has per-model token breakdowns instead of just a flat sum.
  3. Add an accuracy disclaimer in the dashboard (e.g. a small info tooltip on the Cost / Avg Cost headers) explaining whether the number is SDK-reported, model-table-priced, or a fallback estimate, and the date of the rate table. We want this statement regardless of which path produced the number.
  4. Tests:
    • Unit tests for the calculator (known model, unknown model fallback, cache token handling, SDK-reported cost path).
    • Update internal/webapi/handlers_test.go and internal/models/outcome_test.go as needed.
    • Update Playwright dashboard tests if the cost column / KPI rendering changes.
  5. Docs: brief note in site/ (dashboard guide) describing how cost is calculated and its accuracy caveats.

Acceptance criteria

  • estimateCost removed or kept only as the explicit fallback path.
  • Per-model pricing table in internal/webapi/ (or new internal/pricing/ package) with at least Claude Opus/Sonnet/Haiku and GPT‑5 family.
  • When SDK RequestCost is present, dashboard uses it directly.
  • Cache read/write tokens are included in cost when priced from the table.
  • Dashboard UI displays an accuracy disclaimer / tooltip explaining the calculation source.
  • Go tests pass (make test, make lint).
  • Web e2e tests pass (cd web && npx playwright test --project=chromium).
  • README and site/ docs updated.

References

  • internal/webapi/store.go:160-171 — current estimateCost
  • internal/webapi/storage_adapter.go:79-115Summary() aggregation
  • internal/execution/session_usage_collector.go:95-121 — where RequestCost per model is collected but unused
  • internal/models/outcome.go:224ModelUsage.RequestCost field

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions