Skip to content

fix(memory): bm25RankToScore returns constant 1.0 for all negative BM25 ranks#14005

Closed
niceysam wants to merge 1 commit intoopenclaw:mainfrom
niceysam:fix/bm25-rank-to-score-negative-values
Closed

fix(memory): bm25RankToScore returns constant 1.0 for all negative BM25 ranks#14005
niceysam wants to merge 1 commit intoopenclaw:mainfrom
niceysam:fix/bm25-rank-to-score-negative-values

Conversation

@niceysam
Copy link
Contributor

@niceysam niceysam commented Feb 11, 2026

FTS5 bm25() returns negative values (more negative = more relevant), but bm25RankToScore was doing Math.max(0, rank) which clamps all negative scores to 0. That means 1 / (1 + 0) = 1.0 for every result, so BM25 relevance was completely thrown away in hybrid search.

Fixed by using Math.abs(rank) instead, which preserves the magnitude while keeping the normalization formula working.

Added test assertions for monotonic behavior with negative inputs.

Fixes #5767

…25 ranks

FTS5's bm25() returns negative values where more negative means more
relevant. The previous implementation used Math.max(0, rank) which
clamped all negative values to 0, causing every keyword match to
receive an identical normalized score of 1.0. This effectively
discarded BM25 relevance information in hybrid search.

Fix: use Math.abs(rank) so negative BM25 scores are correctly
converted to positive magnitudes before normalization.

Fixes openclaw#5767
@0xRaini

This comment was marked as spam.

TWFBusiness added a commit to TWFBusiness/openclaw-tw that referenced this pull request Feb 11, 2026
…pstream fixes

- Remove Discord, Telegram, Slack, Signal, iMessage, BlueBubbles extensions
- Remove @buape/carbon, grammy, and other gateway-specific dependencies
- Add session-based login system (cookie auth, rate limiting)
- Add type stubs for optional deps (baileys, ciao, qrcode-terminal)
- Apply upstream PRs: XSS fix (openclaw#13952), orphan tool_results (openclaw#13974),
  Anthropic tool ID sanitization (openclaw#13976), BM25 score fix (openclaw#14005),
  redaction fix (openclaw#14006), subagent timeout (openclaw#13996), context window
  guard (openclaw#13986), model default reset (openclaw#13993), cache token accounting (openclaw#13853)
- Fix all TypeScript compilation errors from channel removal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@niceysam
Copy link
Contributor Author

Thanks for the feedback! To be honest, I found this through code inspection, not empirical testing. I don't have before/after metrics — just noticed the bug in static analysis. Would be great if someone with production data could validate the improvement.

@papago2355
Copy link
Contributor

papago2355 commented Feb 16, 2026

Thanks for the feedback! To be honest, I found this through code inspection, not empirical testing. I don't have before/after metrics — just noticed the bug in static analysis. Would be great if someone with production data could validate the improvement.

Test by

  • hello world <->'"hello" AND "world"' AND "hei"
BM25 Score OLD NEW
-4.0 (best) 1.0 0.67
-2.0 (mid) 1.0 0.50
-0.5 (weak) 1.0 0.20

Haven't check long context retrieval.
For longer sessions/contexts, keyword matching would be quite weak.

@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Feb 21, 2026
@openclaw-barnacle openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Feb 23, 2026
@buyitsydney
Copy link

+1 from a CJK-heavy deployment.

We independently discovered the same bug and applied an identical fix locally. Can confirm:

  • Before fix: Every BM25 match gets textScore = 1.0, hybrid search degrades to boolean keyword matching
  • After fix: BM25 ranking properly differentiates relevant from less-relevant keyword hits

Our local fix uses a slightly different normalization formula (absRank / (1 + absRank) instead of 1 / (1 + Math.max(0, rank))), but the core insight is the same — stop clamping negative ranks to zero.

This is especially impactful for CJK users where BM25 is already weakened by FTS5 tokenization issues (#20730). Having the BM25 scores that do work actually carry meaningful weight in hybrid ranking is critical.

Would love to see this merged. 🙏

@buyitsydney
Copy link

Friendly ping — this PR has been open for 2+ weeks and the stale bot is circling. 🙂

The fix is straightforward (one-line Math.max(0, rank)Math.abs(rank)) and addresses a real scoring bug that collapses all BM25 matches to the same score. Multiple users have independently confirmed the issue (#5767, and our CJK deployment hit it too).

Would love to see this land — happy to help with any requested changes. @m1heng could this get a review when you have a moment?

@papago2355
Copy link
Contributor

Friendly ping — this PR has been open for 2+ weeks and the stale bot is circling. 🙂

The fix is straightforward (one-line Math.max(0, rank)Math.abs(rank)) and addresses a real scoring bug that collapses all BM25 matches to the same score. Multiple users have independently confirmed the issue (#5767, and our CJK deployment hit it too).

Would love to see this land — happy to help with any requested changes. @m1heng could this get a review when you have a moment?

Well, I was the one who originally posted PR #5214 a month ago, but apparently it is still not yet merged and closed. 🙂

@buyitsydney
Copy link

Friendly ping for maintainer review — this PR has been open 16 days and the stale bot has already flagged it.

Summary: One-line fix (Math.max(0, rank)Math.abs(rank)) that stops all BM25 scores collapsing to 1.0. Three independent confirmations:

  1. @papago2355 — test data showing score differentiation (see comment)
  2. Our CJK deployment — identical fix applied locally, verified in production
  3. @niceysam (author) — found via code inspection

This is the scoring half of the CJK search problem. I just opened #29396 for the MMR tokenization half. Together with #20730 (FTS5 tokenization), these form a cluster of fixes that would significantly improve memory search for CJK users (see also the real-world report from @KJT125, an ER doctor in Taiwan using Traditional Chinese).

The fix is minimal risk — 2 files, +6/−2, pure math correction. Would love to see this land. 🙏

@Takhoffman @steipete

@steipete
Copy link
Contributor

steipete commented Mar 7, 2026

This PR was closed automatically by AI triage due to current maintainer bandwidth limits. Sorry if this is incorrect — please reopen and comment if we got this wrong, and we will re-review.

Reason category: duplicate / superseded by #33757.

Why closing:

  • Current origin/main (b4bac484e34bac48f241640365a762c1ba79ee33) still has the #5767 bug in bm25RankToScore.
  • This PR commit e4771888932f07370e51677892668e1137e92fe1 fixes the constant-1.0 symptom, but it changes the mapping to 1 / (1 + Math.abs(rank)).
  • That reverses FTS5 BM25 ordering: a stronger match like -10 scores lower than a weaker match like -1, even though bm25() semantics are more-negative = more relevant.
  • PR fix(memory): preserve FTS5 BM25 relevance in bm25RankToScore (#5767) #33757 commit ec77563c8b8df08b0747cd4445b9d2b901cf28ea preserves the ordering correctly by mapping negative ranks to positive relevance before normalization.

Sorry for the churn here. If you want to rework this PR around the monotonic negative-rank mapping used in #33757, please reopen.

@steipete steipete closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: bm25RankToScore always return same value

6 participants