Skip to content

Fix zero scores when querying hypertables with BM25 index#168

Merged
tjgreen42 merged 7 commits intomainfrom
fix-hypertable-zero-scores
Jan 25, 2026
Merged

Fix zero scores when querying hypertables with BM25 index#168
tjgreen42 merged 7 commits intomainfrom
fix-hypertable-zero-scores

Conversation

@tjgreen42
Copy link
Copy Markdown
Collaborator

@tjgreen42 tjgreen42 commented Jan 23, 2026

Summary

This PR fixes zero scores when querying hypertables with BM25 indexes.

Commit 1: Planner hook fix for CustomScan

  • Add T_CustomScan handling in plan_has_bm25_indexscan() to detect BM25 index scans nested inside custom scans (e.g., TimescaleDB's ConstraintAwareAppend)
  • Add T_CustomScan handling in replace_scores_in_plan() to replace score expressions in custom scan children

Commit 2: Standalone scoring fix for hypertable parent indexes

  • When using standalone BM25 scoring with a hypertable parent index name, the code was falling back to child index stats but NOT switching to the child's index relation and segment metadata
  • This caused IDF calculation to fail because it was looking up document frequencies in the parent index's segments (which are empty)
  • The fix switches to the child index for segment access when falling back to a child index's state

Testing

  • Verified fix with reproduction case from bug report
  • Added Test 5 in partitioned.sql for MergeAppend score expression replacement
  • Added test/scripts/hypertable.sh for optional TimescaleDB integration testing (runs only if TimescaleDB is installed)
-- Both queries now return proper BM25 scores:

-- Query through parent hypertable with ORDER BY
SELECT content, -(content <@> to_bm25query('database', 'hyper_idx')) as score
FROM hyper_docs
ORDER BY content <@> to_bm25query('database', 'hyper_idx')
LIMIT 5;

-- Standalone scoring with parent index name
SELECT content, (content <@> to_bm25query('database', 'hyper_idx')) as score
FROM hyper_docs
WHERE content LIKE '%database%';

The planner hook's plan_has_bm25_indexscan() and replace_scores_in_plan()
functions didn't handle CustomScan nodes (like TimescaleDB's
ConstraintAwareAppend). This caused the hook to miss BM25 index scans
nested inside custom scans, so score expressions weren't replaced with
stub functions that retrieve cached scores from the index scan.

As a result, standalone scoring was used instead, which looked up the
parent hypertable index (which has total_docs = 0), producing zero
scores for all results.

The fix adds T_CustomScan cases that recurse into cscan->custom_plans
to properly detect and process BM25 index scans.
When using standalone BM25 scoring on a hypertable with the parent index
name (e.g., `content <@> to_bm25query('database', 'parent_idx')`), the
code was falling back to child index stats (total_docs, avg_doc_len) but
NOT switching to the child's index relation and segment metadata. This
caused IDF calculation to fail because it was looking up document
frequencies in the parent index's segments (which are empty).

The fix switches to the child index for segment access when falling back
to a child index's state. This ensures that both memtable and segment
lookups use the correct child index.

Also adds:
- Test 5 in partitioned.sql for MergeAppend score expression replacement
- test/scripts/hypertable.sh for optional TimescaleDB integration testing
- Install TimescaleDB for PG 17 and 18 in CI
- Configure shared_preload_libraries for both system and test instances
- Add hypertable.sh to shell-based tests

The test gracefully skips if TimescaleDB installation fails for a
specific PG version.
Add WHERE id <= 100 filter to the top-k query. With 150K documents,
many have identical scores (same i % 15 pattern), making the result
order non-deterministic. Limiting to the first 100 IDs ensures
consistent results.
The system PostgreSQL config file path didn't exist. Instead, create
a dedicated PostgreSQL instance for shell tests with TimescaleDB
configured in shared_preload_libraries.
@tjgreen42 tjgreen42 merged commit 3244d05 into main Jan 25, 2026
12 checks passed
@tjgreen42 tjgreen42 deleted the fix-hypertable-zero-scores branch January 25, 2026 16:35
tjgreen42 added a commit that referenced this pull request Jan 25, 2026
## Summary

Backport of #168 to the 0.4.2 release branch.

Fixes zero scores when querying hypertables with BM25 indexes:

1. **Planner hook fix for CustomScan** - Add `T_CustomScan` handling in
`plan_has_bm25_indexscan()` to detect BM25 index scans nested inside
custom scans (e.g., TimescaleDB's ConstraintAwareAppend)

2. **Standalone scoring fix for hypertable parent indexes** - When using
standalone BM25 scoring with a hypertable parent index name, the code
was falling back to child index stats but NOT switching to the child's
index relation and segment metadata

## Testing

Cherry-picked from main where it passed all CI checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant