feat(#251 follow-up): real-embedding ablation result + opt-in test by jayzalowitz · Pull Request #272 · jayzalowitz/skytwin

jayzalowitz · 2026-05-13T00:04:12Z

Summary

Negative result with a clear next step. The working hypothesis going into this run was: hash-trick spurious overlap was inflating the received_content MRR regression first surfaced in PR #260; real semantic embeddings should improve the number materially. The hypothesis was wrong.

Numbers

Ran the tier-ablation eval against Ollama + nomic-embed-text (a real semantic model, ~137M params, 768-dim) and compared to the hash-trick baseline:

Metric	pure-RRF	tier-weighted (hash)	tier-weighted (real)
`user_behavior` MRR (n=3)	0.667	1.000	1.000
`received_content` MRR (n=3)	1.000	0.542	~0.54
`neutral` MRR (n=1)	1.000	1.000	1.000

The real-embedding received_content number is essentially identical to the hash-trick floor. Hash-trick was not the cause.

What's actually broken

The regression is structural to the multiplicative weighting approach:

user_sent_originated × 1.5 vs inbox_automated × 0.8 = 1.875× swing.
Any page within 53% of the top raw score that's authored_* will leapfrog a strong-but-demoted primary hit on a received_* query.
The floor-ratio gate (0.85 default) helps but isn't enough. With real semantic embeddings, an authored page from a completely unrelated query (e.g. q1's Series B pitch deck) has non-trivial similarity to q5 ("GitHub Actions CI failed on main") — enough to land in the candidate pool above threshold, where the 1.5× boost then beats the legitimate primary.
Diagnostic dump from the eval, q5 top-10 with tier-on:
q7-authored-1, q3-authored-long, q1-authored-1, distractor-004, q2-received-1, distractor-014, distractor-034, q5-received-1, q6-received-1, distractor-024
The actual primary lands at rank 8.

Decision

Layer 2 stays opt-in. The default-on rollout is blocked on a structural fix, not on environment / corpus / eval setup. Best-judgment path:

Switch from multiplicative weighting (score *= tier_weight) to additive bonuses (score += tier_bonus) sized to flip close calls without leapfrogging strong matches. For RRF scores in the 0.016–0.033 range, bonuses around ±0.005 should give authored a tie-breaker edge without overwhelming a clear relevance gap.
Re-run the ablation against the additive design. Target: received_content MRR ≥ 0.95 while preserving user_behavior MRR = 1.0.
Separate sub-issue — out of scope for this PR.

What this PR ships

New opt-in describe block in tier-ablation-eval.test.ts gated on RUN_REAL_EMBEDDING_EVAL=1. Reproducible with any local Ollama or OpenAI key — defaults respect OPENAI_EMBEDDING_BASE_URL / OPENAI_EMBEDDING_MODEL / OPENAI_EMBEDDING_API_KEY. Pre-pulled Ollama with nomic-embed-text is the cheapest way to run it.
runOneMode helper now takes an optional embedding provider so the same harness drives both the always-on hash-trick test and the opt-in real-embedding test. No duplication.
Diagnostic dump in printReport now triggers for queries that degrade by more than 2 ranks (not just "missing"), so future tuning has better signal for partially-regressed cases.

The opt-in test asserts the realistic guardrail bars (received_content ≥ 0.4, user_behavior must lift, neutral must not regress) — same shape as the always-on test, so a future regression in the implementation surfaces in either mode.

Why ship a negative result

Because the ENGINEERING DECISION is now made on data instead of vibes. PR #260 explicitly said "we'll know whether Layer 2 default-on is safe once we run this against real embeddings." We ran it. The answer is "not yet." That's worth committing — both as a permanent artifact and as a clear signal to the next person picking up the additive-redesign sub-issue.

Test plan

pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval → 1 pass, 1 skipped (gated test).
RUN_REAL_EMBEDDING_EVAL=1 pnpm --filter @skytwin/memory-gbrain test -- tier-ablation-eval → 2 pass (both modes), prints side-by-side report.
Build + workspace test pre-flight: was clean before this PR (most recent main: PR feat(#251 follow-up): authoring-tier backfill worker #271 merged 23:47Z).

🤖 Generated with Claude Code

Validates the working hypothesis that hash-trick embeddings were inflating the `received_content` MRR regression observed in PR #260. Result: the hypothesis was wrong. With Ollama + nomic-embed-text (a real semantic model), received_content MRR = ~0.54, essentially identical to the hash-trick floor of 0.542. The regression is structural to the multiplicative weighting approach: - 1.5× authored vs 0.8× automated = 1.875× swing. Authored content within 53% of the top raw score leapfrogs strong-but-demoted primary hits. - The 0.85 floor-ratio gate isn't enough: with real semantic embeddings, authored content from unrelated queries (e.g. q1 Series B pitch) has non-trivial similarity to q5 ("GitHub Actions CI failed") — lands in the candidate pool above threshold, gets the 1.5× boost, beats the legitimate primary. Diagnostic dump from the eval confirms: q5's primary lands at rank 8/9 with tier-on, behind three authored pages from unrelated queries plus several distractors. Decision: Layer 2 stays opt-in. Default-on is blocked on a structural fix (switch to additive bonuses, target received_content MRR ≥ 0.95). That's a separate sub-issue. What ships: - New opt-in test branch in tier-ablation-eval.test.ts, gated on RUN_REAL_EMBEDDING_EVAL=1. Reproducible with any local Ollama or OpenAI key — defaults respect OPENAI_EMBEDDING_BASE_URL / OPENAI_EMBEDDING_MODEL / OPENAI_EMBEDDING_API_KEY. - runOneMode helper now takes an optional embedding provider so the same harness drives both modes. - Diagnostic dump in printReport now triggers for queries that degrade by more than 2 ranks (not just "missing"), so future tuning has better signal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds an opt-in “real embeddings” variant of the existing Layer 2 tier-ablation evaluation to validate behavior under an OpenAI-compatible embedding endpoint (e.g., local Ollama), and records the negative-result findings in the changelog for future decision-making.

Changes:

Extend the ablation harness (runOneMode) to accept an optional EmbeddingProvider so the same test logic can run with hash-trick or real embeddings.
Expand diagnostic reporting to also dump tier-on top-10 results when the primary degrades by multiple ranks (not only when it disappears).
Add a gated (RUN_REAL_EMBEDDING_EVAL=1) real-embedding test mode and document the measured outcome in CHANGELOG.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
packages/memory-gbrain/src/tests/tier-ablation-eval.test.ts	Adds opt-in real-embedding eval path, parameterizes embedding provider, and broadens diagnostic dumps for rank regressions.
CHANGELOG.md	Documents the real-embedding ablation result and the resulting decision to keep Layer 2 opt-in pending a structural fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jayzalowitz · 2026-05-13T00:10:01Z

+// Real-embedding ablation — opt-in. Validates the working hypothesis that
+// hash-trick embeddings exaggerate the `received_content` MRR regression
+// because of spurious token overlap, and that real semantic embeddings
+// preserve the user_behavior lift WITHOUT the regression. Gated on
+// `RUN_REAL_EMBEDDING_EVAL=1` and an OpenAI-compatible endpoint reachable
+// at `OPENAI_EMBEDDING_BASE_URL` (defaults to local Ollama).


Addressed in f17b2b7. Rewrote the section header to describe what the eval actually is now (a permanent reproducible artifact for whoever picks up the additive-rewrite sub-issue), rather than the original hypothesis it was added to test.

jayzalowitz · 2026-05-13T00:10:02Z

+    // signal for tuning the multipliers. Also dump when the primary
+    // degrades by more than 3 ranks — that's the case where Layer 2
+    // is reordering legitimately-strong primary hits behind other content.
+    const degradedSignificantly =
+      Number.isFinite(o.rankPrimary) &&
+      Number.isFinite(n.rankPrimary) &&
+      n.rankPrimary - o.rankPrimary > 2;


Addressed in f17b2b7. Aligned the condition with the comment: n.rankPrimary - o.rankPrimary >= 3 and comment now reads "degrades by 3 or more ranks". Same semantics, no ambiguity.

jayzalowitz · 2026-05-13T00:10:03Z

+- Switch from multiplicative weighting (`score *= tier_weight`) to additive bonuses (`score += tier_bonus`) sized to flip close calls without leapfrogging strong matches. Estimated +0.005 for authored-originated, -0.005 for automated, on raw RRF scores in the 0.016–0.033 range — enough to break ties without overwhelming relevance.
+- Re-run the ablation with the additive approach. Target: received_content MRR ≥ 0.95 while preserving user_behavior MRR = 1.0.
+
+That's a separate sub-issue — out of scope for tonight.


Addressed in f17b2b7. Changed to "out of scope for this PR".

Three small but valid finds: 1. Real-embedding section header said the eval "validates the working hypothesis" but the actual result was the opposite. Rewrote the header to describe what the eval actually is now — a permanent reproducible artifact for the next person tuning Layer 2 weighting. 2. Off-by-one between comment and condition for the diagnostic dump. Comment said "more than 3 ranks"; code was `> 2` (i.e. 3 or more). Aligned to the more useful semantics: `>= 3`, "3 or more rank degradation". 3. CHANGELOG had "out of scope for tonight" — time-relative phrasing that won't read well later. Changed to "out of scope for this PR". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#251 Phase 1): Layer 2 additive rewrite + default-on flip Phase 1.1 + 1.2 of the multi-phase plan. # 1.1 — additive rewrite Replaces multiplicative tier weighting (`score *= weight`) with additive bonuses (`score += bonus`). The real-embedding ablation in PR #272 showed multiplicative was structurally bounded — a 1.5×/0.8× swing (1.875× ratio) let weak-overlap authored content leapfrog strong primary hits regardless of relevance. Additive bonuses (~±0.005 in the normal band) can flip close calls but never leapfrog strong matches. Promote-only configuration: only authored tiers get a positive bonus; all received tiers are 0. Trying any negative bonus pushed legitimate primary hits on `received_content` queries below distractors. The product intent is "prefer authored on close calls," not "suppress received" — promote-only gives the former without the latter. Floor-ratio gate retained (default 0.85). Real embedders give non-trivial cross-query vector similarity; without the gate, authored content from unrelated queries leaks into the candidate pool and gets boosted past legitimate primaries. Files: - tier-weights.ts: `tierBonus` / `buildTierBonusFn` (additive). `tierMultiplier` / `buildTierWeightFn` re-exported as deprecated aliases for back-compat. - rrf.ts: applies bonus additively, NEGATIVE_INFINITY sentinel for hidden, 0.85 floor-ratio gate. # 1.2 — flip default-on Phase 1.1 cleared the eval bar: user_behavior MRR 0.667 → 1.000 (preserved) received_content MRR 1.000 → 0.833 (real embeddings) → 0.583 (hash-trick floor) aggregate MRR primary 0.857 → 0.929 (above pure-RRF baseline) Files: - Migration 044: ALTER DEFAULT true + backfill existing rows. - parseSettingsRow / in-memory + CRDB upsert / route GET — all default flipped to true. Tests (98 pass, 70 turbo tasks green): - tier-weights.test.ts: 19 cases updated for additive semantics, all 3 calibrations, override composition, back-compat aliases. - rrf.test.ts: new "weak-match doesn't leapfrog" case; existing cases reformulated for additive bonus. - tier-ablation-eval bars tightened: received_content ≥ 0.55 (hash-trick), ≥ 0.75 (real embeddings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#251 Phase 1 post-/review): address Copilot findings on PR #274 Five findings, all valid: 1. JSDoc had "Number. NEGATIVE_INFINITY" split across two lines — reads awkwardly in generated docs. Joined. 2. Inclusion-semantics drift: the post-bonus filter was `rrfScore > 0`, which silently dropped pages with sufficiently-negative bonuses alongside the intended NEGATIVE_INFINITY-sentinel drops. Tightened the filter to only remove the sentinel; negative bonuses now reorder without changing inclusion. Documented in the TierWeightFn JSDoc. 3. Migration 044's comment claimed "only rows that were never explicitly toggled" get backfilled, but the SQL unconditionally flips all tier_weighting=false rows. We don't have a "set by user" audit column to distinguish defaults from opt-outs, so the honest fix is to update the comment — clarifies that this IS an unconditional opt-in. Notes that a future audit column could preserve opt-outs if it becomes important. 4. The "doesn't leapfrog" test in rrf.test.ts had exploratory scratch notes including a "PR #_" placeholder and a self-contradicting "Wait — additive DOES flip this" line. Replaced with a clean explanation of the fixture being asserted, the actual rank/score numbers, and the load-bearing role of the 0.85 floor-ratio gate. 5. tier-weights.ts docstring said "rank-1 vs rank-2 RRF diff is ~0.001" but the table below shows 0.0164 vs 0.0161 = 0.0003. Corrected to 0.0003. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Adds an optional `floorRatio?: number` to applyBacklinkBoost, applySalienceBoost, applyRecencyBoost, and PostFusionOpts. When set, each boost stage skips results whose pre-boost score is below floorRatio * topScore at the moment that stage runs — only the head of the candidate pool receives the multiplicative bonus. Default undefined preserves exact prior behavior bit-for-bit. The failure mode ──────────────── Bounded boosts (the [1.0, ~1.6] log-compressed clip on salience, the log-scaled backlink factor, the half-life-decayed recency factor) work as designed on curated test corpora. On larger corpora indexed with real high-dimensional embedders (text-embedding-3-large, voyage-3-large, voyage-4-large, zembed-1), baseline vector similarity between topically-unrelated "professional content" is non-trivial. Weak-overlap pages land in a query's top-K via vector overlap alone, receive the multiplicative boost, and on a non-trivial fraction of queries a weak page with high metadata signal climbs above the legitimate primary hit. Per-boost factors look harmless in isolation; the compound effect across the long tail is what shifts ranks. The fix ─────── A boost only fires for results within floorRatio * topScore at the moment that stage runs. The long tail keeps its unboosted score and original rank. Stages compose naturally — salience runs against its own top, recency runs against the post-salience top, etc. 0.85 as a starting point comes from a labeled-retrieval ablation in the SkyTwin twin-memory layer: the largest ratio that fully eliminated the leapfrog regression on our labeled corpus while preserving baseline rankings on queries without a metadata signal. Reference: jayzalowitz/skytwin#272 Backward compatibility ────────────────────── floorRatio defaults to undefined → no gate, no threshold computation, exact prior behavior. Existing call sites are untouched; the new param is positional-last and optional on each function. PostFusionOpts.floorRatio is similarly optional and unset by default. Opt-in by design — it changes ranking behavior, so each consumer evaluates against their own corpus before flipping it on. Tests ───── 7 new cases in test/search.test.ts: - default (floorRatio undefined) preserves existing behavior - weak page gated out, top page boosted as before - borderline page at exactly the threshold is eligible - regression scenario: weak page with strong metadata signal cannot leapfrog a strong primary - applySalienceBoost honors the gate (parity with applyBacklinkBoost) - empty results no-op without divide-by-zero - single-result trivially eligible bun test test/search.test.ts: 33/33 pass (was 26/26). bun run verify: pass (typecheck + 12 guard scripts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…g + floor-ratio gate) CLAUDE.md:102 said the package "multiplies fused scores by per-tier weights" — stale since #260/#272 flipped the implementation to additive bonuses (the multiplicative cut had a structural leapfrog regression on real dense embedders). Updated to describe the actual current behavior + the opt-in `floorRatio` gate aligned with gbrain v0.35.6.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…/ PR #1129 (#334) * v0.6.52.0 sync(memory): align floor-ratio gate with gbrain v0.35.6.0 / PR #1129 Our contribution PR #1091 was closed in favor of upstream's reworked shape that merged yesterday as #1129. The codex outside-voice review caught three defensive gaps in the original shape; port the fixes here and align naming with `SearchOpts.floorRatio` / `search.floor_ratio`. Hardened in `packages/memory-gbrain-crdb-adapter/src/rrf.ts`: - No-positive-signal inputs (all-negative, all-NaN, empty) disable the gate via `Number.NEGATIVE_INFINITY` threshold. Prior `topRawScore = 0` init would silently reject every entry against `r.score < 0`. - Out-of-range `floorRatio` (NaN, Infinity, negative, > 1) disables the gate. Defense in depth so a malformed config value never gates anything. - NaN-score skip in the bonus loop. `NaN < threshold` is `false` in JS, so a NaN-scored hit would slip past the gate check and have the bonus added on top — poisoning the sort. Now an explicit `Number.isFinite` check skips the bonus stage for non-finite scores. New surface: - `RrfFoldOptions.floorRatio` (deprecated alias `tierWeightFloorRatio` preserved; new name wins when both are set). - `computeFloorThreshold(entries, floorRatio)` exported helper, mirrors gbrain's same-named function for cross-port mental-model consistency. - `DEFAULT_FLOOR_RATIO = 0.85` exported as a named constant. Tests: - 12 new cases pinning the defensive guards (out-of-range / NaN / Infinity / empty / negative-top / all-NaN / mixed), the precedence rule between the new and deprecated option names, and an updated strong-vs-tail RRF setup that actually exercises the gate (RRF flatness means rank-1 vs rank-2 don't differ enough — you need rank-20+ in a single list). - 130/130 RRF tests pass; 100/100 `@skytwin/memory-gbrain` tests pass; the realistic-retrieval ablation reports `mean R@5 1.000 pure-RRF / 0.929 tier-on`, unchanged. Upstream feature triage (filed for follow-up, not in this PR): - #897 search-lite (token budget + semantic query cache + intent weighting) — pursue first, ~2 days. Token budget addresses Claude API limits. - #1008 zerank-2 reranker — pursue second, ~1.5 days. Slots between RRF fold and tier-weight bonus. - #996 federated_read — skip (one brain per user). - #1131 temporal trajectory — defer (entity-time-series shape not our fit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rrf): codex T2/T3 + clarify floorRatio:0 test (post-/review) Three findings from /review's codex outside-voice pass + one nit from the structured Pass-1/Pass-2 review. T2 — Invalid floorRatio bypassing legacy guard. Old precedence `options.floorRatio ?? options.tierWeightFloorRatio ?? DEFAULT_FLOOR_RATIO` meant `floorRatio: NaN` (e.g. from buggy config parse) won the chain and disabled the gate, even if the caller had `tierWeightFloorRatio: 0.85` working. A partially migrated caller piping a malformed new option silently nullified the legacy guard. New `pickValidFloorRatio` helper walks the candidates and uses the first finite value in [0, 1]; invalid falls through to the alias, then to `DEFAULT_FLOOR_RATIO`. T3 — NaN/+Infinity rrfScores surviving the sort. The comment claimed non-finite scores "sort to the end," but `b.rrfScore - a.rrfScore` returns `NaN` for any NaN side, which JS sort treats as 0 (equal) — leaving NaN-scored hits in insertion order, where they can land in top-k via `slice(0, k)`. `+Infinity` sorts to the top of every query. Reachable when a caller passes `rrfK: NaN` (which makes every `1 / (rrfK + rank)` NaN). Fix: the post-loop filter drops ALL non-finite-scored entries (was: only `-Infinity` hidden sentinel), and a mirror filter applies on the pure-RRF path so corrupted scores never reach the comparator. Sort now operates only on finite scores and produces a deterministic ranking. Nit — Test `floorRatio: 0 disables the bonus completely` name + comment contradicted the test's own assertions (which confirm the bonus IS applied for every positive-score hit). Renamed to match the actual behavior: `floorRatio: 0` is a valid in-range value, threshold computes to 0, every positive-score hit passes the gate. Distinct from the `undefined`/out-of-range disable path even though they're observationally equivalent for positive-score inputs. 5 new test cases pin the codex fixes: - invalid floorRatio falls back to deprecated alias when alias is valid - invalid floorRatio + invalid alias falls back to DEFAULT_FLOOR_RATIO - floorRatio: undefined falls through to alias when alias is valid - rrfK: NaN corrupts all contributions → all hits dropped (output []) - partial corruption (some finite hits) survives the filter intact CHANGELOG updated to reflect: 135 tests (was 130), and the codex review fixes are called out under a "Codex review fixes (post-review)" subsection so the audit trail is visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rrf-tests): address Copilot review — strengthen invalid-floorRatio test setup Copilot caught a coverage gap in three new tests: the rank-1-vs-rank-2 text-only setup doesn't actually exercise the gate, because rank-2 rrfScore (1/62 ≈ 0.0161) is above the default 0.85 × 1/61 (≈ 0.0139) — so the assertion passes regardless of whether `floorRatio: NaN` (or -0.5, or 1.5) correctly disabled the gate, fell back to default, or did anything at all. Same class as the gap I caught and fixed in the back-compat tests; missed updating these three. Rewritten to use the `strongVsTail` helper (rank-1-in-both + rank-21-text-only) so the assertions distinguish "gate at 0.85" from "gate disabled" — the weak hit's rrfScore is 1/81 (well below 0.85 × 2/61 = 0.0279), so the bonus only applies if the gate is genuinely disabled. Note: the test semantics also flipped because of the codex T2 fix landed in 89fd6be. Pre-T2, invalid `floorRatio` disabled the gate. Post-T2, invalid falls back to the alias then to DEFAULT_FLOOR_RATIO. So the renamed tests now assert "falls back to DEFAULT_FLOOR_RATIO" rather than "disables gate." The test rationale comment block calls this out explicitly so a future maintainer doesn't try to revert to the pre-T2 expectations. CHANGELOG test count corrected: 22 new test cases (was 17, originally 12 — my mistake; the count drifted across each round of review fixes). Copilot's other two comments were already addressed in 89fd6be: - "Test name `floorRatio: 0 disables bonus` contradicts assertions" → fixed - "NaN-score skip leaves non-finite rrfScore in entries; sort can be poisoned" → fixed (post-loop `isFinite` filter drops non-finite scores before sort). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: clarify memory-gbrain-crdb-adapter description (additive scoring + floor-ratio gate) CLAUDE.md:102 said the package "multiplies fused scores by per-tier weights" — stale since #260/#272 flipped the implementation to additive bonuses (the multiplicative cut had a structural leapfrog regression on real dense embedders). Updated to describe the actual current behavior + the opt-in `floorRatio` gate aligned with gbrain v0.35.6.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 00:04

Copilot started reviewing on behalf of jayzalowitz May 13, 2026 00:04 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

jayzalowitz merged commit 1505666 into main May 13, 2026
8 checks passed

jayzalowitz mentioned this pull request May 13, 2026

docs: sync README + CLAUDE.md with v0.6.18-0.6.21 merge sweep #273

Merged

3 tasks

jayzalowitz mentioned this pull request May 17, 2026

feat(search): opt-in floor-ratio gate for post-fusion boost stages garrytan/gbrain#1091

Closed

garrytan mentioned this pull request May 17, 2026

v0.35.6.0 feat(search): floor-ratio gate for metadata boost stages (closes #1091) garrytan/gbrain#1129

Merged

5 tasks

jayzalowitz mentioned this pull request May 18, 2026

v0.6.52.0 sync(memory): align floor-ratio gate with gbrain v0.35.6.0 / PR #1129 #334

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#251 follow-up): real-embedding ablation result + opt-in test#272

feat(#251 follow-up): real-embedding ablation result + opt-in test#272
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/251-real-embedding-eval

jayzalowitz commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jayzalowitz May 13, 2026

Uh oh!

jayzalowitz May 13, 2026

Uh oh!

jayzalowitz May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jayzalowitz commented May 13, 2026

Summary

Numbers

What's actually broken

Decision

What this PR ships

Why ship a negative result

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jayzalowitz May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jayzalowitz May 13, 2026

Choose a reason for hiding this comment

Uh oh!

jayzalowitz May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants