Skip to content

Edge billing cutover#349

Merged
breardon2011 merged 3 commits into
mainfrom
edge-billing-cutover
Jun 3, 2026
Merged

Edge billing cutover#349
breardon2011 merged 3 commits into
mainfrom
edge-billing-cutover

Conversation

@breardon2011

@breardon2011 breardon2011 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Edge-first Pro billing (shadow-mode, behind flags)

What & why

Moves Pro billing off per-cell Postgres onto the global edge (D1 + Workers), so cells become
stateless-for-billing and billing is a single global concern. Free billing already runs on the
edge; this ports the Pro path (scale-event → Stripe meter pipeline).

Everything runs in shadow mode behind flags — no change to what anyone is billed until an
explicit, documented cutover.
The cell stays authoritative throughout the soak.

What's included

  • Producerusage_ticker stamps memory_mb / cpu_count on usage_tick.
  • Ingest (events-ingest) — writes pro ticks to D1 usage_samples; resolves plan
    authoritatively from D1, not the stale cell-PG value on the envelope.
  • Rollup (billing-rollup, new cron Worker) — aggregates usage_samples
    usage_meter_events → Stripe meter events. billing_mode-aware (legacy per-tier / unified
    flat), idempotent (deterministic ids + Stripe identifier). SHADOW gates the send;
    BILL_FROM gates the cutover boundary.
  • Provisioning — the Stripe webhook (which must terminate on the public edge) now creates
    the subscription itself: all catalog prices + $30 credit, idempotent. The price catalog is
    published to D1 billing_prices by a one-shot, global cmd/ensure-products.
  • Authority switchPRO_BILLING_AUTHORITY=cell|edge (CP) gates the three cell billers
    (usage_reporter/allocator/sender) and requires cap-tokens for creates (no edge bypass).
  • Parityusage-parity checker (CP) + /internal/usage-parity (edge) diff edge vs cell
    GB-seconds during the soak.
  • Plan authorityeffectivePlan resolves cap-token → D1 org-policy → cell-PG.

Flags (all default to no-op / shadow)

Flag Where Default Effect
SHADOW billing-rollup var true compute but don't ship to Stripe
PRO_BILLING_AUTHORITY CP env cell cell bills; edge shadows + requires cap-tokens when edge
BILL_FROM billing-rollup var unset/0 bill only buckets starting ≥ T
OPENSANDBOX_USAGE_PARITY_URL CP env unset enables parity checker (set to <edge-host>/internal/usage-parity)

Data flow

worker ticks → events-ingest → D1 usage_samples → billing-rollup → usage_meter_events → Stripe; parity checker diffs edge vs cell sandbox_scale_events.

Stripe webhook

The webhook terminates on the edge (<edge-host>/webhooks/stripe; prod is already
https://app.opencomputer.dev/webhooks/stripe). Keep a single endpoint (the edge) — the cell
also has a /webhooks/stripe route but must NOT be registered, or you'd double-provision.
The edge handler needs these events subscribed:

  • checkout.session.completedrequired; the setup-checkout completion drives
    provisioning + mark-pro.
  • customer.subscription.deletedrequired; downgrade → mark-free.
  • customer.subscription.created — optional (redundant mark-pro); safe to include.
    All other event types are ack'd and ignored.

Validated end-to-end on dev (Stripe test mode)

  • Upgrade → edge provisions subscription (9 prices + $30 credit); idempotent on webhook retry.
  • Metering → rollup → live send → Stripe aggregated the GB-seconds against the subscription.
  • PRO_BILLING_AUTHORITY both modes (billers start/stop; direct API-key create → 401 in edge mode).
  • BILL_FROM boundary (below-T consumed-not-billed, ≥T billed).
  • Parity checker (edge vs cell GB-seconds, within tolerance).

Deploy & rollout safety

No deploy-ordering dependency between worker/CP/edge (each is backward- and forward-compatible:
old ticks lack dims → usage_samples written as 0; new ticks are a superset old consumers
ignore). All money-moving paths default to off. Per component:

  • CP, worker, events-ingest, billing-rollupno-op on deploy. Gated by
    SHADOW=true + PRO_BILLING_AUTHORITY=cell. Live-but-harmless effects: shadow data
    accumulates in D1 (no cleanup yet), and plan is now read authoritatively from D1 (correctness,
    not an amount change).
  • api-edgeNOT a no-op on deploy. Prod Stripe already points at the edge, so
    deploying adds provisionProSubscription to the live webhook. Hard rule:

    Publish the catalog to that env's D1 (ensure-productsbilling_prices) BEFORE deploying
    api-edge.
    An empty catalog makes the first upgrade 500 (stuck) — provisioning runs before
    mark-pro and returns 5xx so Stripe retries.

  • First api-edge deploy in an env may need the SandboxWsGateway DO migration-ledger
    reconciliation (delete-class), independent of the above.

Cutover sequence (per environment, after parity is clean ≥24h)

  1. Pre-reqs: worker + CP + edge deployed; the overage/reserved Prices exist in that env's
    Stripe Dashboard; ensure-products has published the catalog to that env's D1 billing_prices;
    the edge /webhooks/stripe is the single registered endpoint with the events above.
  2. Pick a boundary T (a quiet bucket boundary, unix seconds).
  3. Drain the cell up to T: temporarily CAPACITY_ALLOCATOR_SETTLE=1m, restart CP, wait for
    the allocator to settle + the sender to flush — confirm no pending billable_events and no
    unbilled buckets < T.
  4. Edge: set BILL_FROM=T + SHADOW=false on billing-rollup, deploy → edge bills [T, ..).
  5. CP: set PRO_BILLING_AUTHORITY=edge, restart → cell billers stop, cap-tokens required.
  6. Verify: meter events flowing to Stripe; cell billers stopped; no bucket double/under-billed
    across the seam (/run on billing-rollup echoes cfg.billFrom for confirmation).
  • Rollback: revert flags (SHADOW=true, PRO_BILLING_AUTHORITY=cell) + restart. Already-sent
    Stripe meter events can't be un-sent — a post-cutover over-bill needs manual credit adjustments,
    which is why the shadow soak exists. Don't shorten it.

@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​wrangler@​4.97.0991009296100
Addednpm/​@​cloudflare/​workers-types@​4.20260602.1100100100100100

View full report

@breardon2011 breardon2011 marked this pull request as ready for review June 3, 2026 01:12
@breardon2011 breardon2011 merged commit cf0db1f into main Jun 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants