Skip to content

chore: bump LocalAGI + localrecall (fix pgvector hybrid search seqscan, #10186)#10192

Merged
mudler merged 1 commit into
masterfrom
fix/pgvector-hybrid-search-10186
Jun 6, 2026
Merged

chore: bump LocalAGI + localrecall (fix pgvector hybrid search seqscan, #10186)#10192
mudler merged 1 commit into
masterfrom
fix/pgvector-hybrid-search-10186

Conversation

@localai-bot

@localai-bot localai-bot commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

What

Bumps the agent stack to pull in the PostgreSQL hybrid-search fix:

go.mod/go.sum only; no LocalAI source changes (the localrecall Search API is unchanged).

Why

Closes #10186. The hybrid search query in localrecall wrapped the vector distance operator in a scalar similarity expression and sorted on the alias (ORDER BY similarity DESC). pgvector's HNSW/DiskANN index can only serve a bare ORDER BY embedding <=> $vec path, so the wrapped form degraded to a full sequential scan over every row. On the reporter's ~2.4M-row collection this exceeded the 10s statement timeout and returned {"count":0,"results":[]}.

The fix adopts the canonical Reciprocal Rank Fusion hybrid pattern recommended by pgvector and Timescale (pg_textsearch + pgvectorscale): each arm retrieves its top-N candidates via a bare, index-served operator and a rank; the arms are fused with a FULL OUTER JOIN + weighted RRF, then joined back by PK. Verified with EXPLAIN on the DiskANN + pg_textsearch stack: Seq Scan -> index scans on both arms (Order By: (embedding <=> / (full_text <@>), planner cost 5128 -> 648.

Chain

Pseudo-versions point at those branch HEADs and will update as the upstream PRs merge.

Fixes #10186

@mudler mudler force-pushed the fix/pgvector-hybrid-search-10186 branch 2 times, most recently from 8e10ecb to 25c774f Compare June 6, 2026 07:10
Bumps the agent stack to pull in the PostgreSQL hybrid-search fix:

- mudler/localrecall -> v0.6.3-...-9a3b3321a9cd (mudler/LocalRecall#46, merged)
- mudler/LocalAGI    -> ...-14aed1ae4336 (mudler/LocalAGI#477, merged)

localrecall's hybrid search previously sorted on a wrapped scalar
similarity expression, which blinded the planner into a full sequential
scan over every row and exceeded the statement timeout on large
collections, returning an empty result set. It now uses the canonical
Reciprocal Rank Fusion pattern (index-backed candidate retrieval + FULL
OUTER JOIN + weighted RRF).

Fixes #10186

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the fix/pgvector-hybrid-search-10186 branch from 25c774f to b2d3eaf Compare June 6, 2026 07:15
@mudler mudler merged commit 0e4cee9 into master Jun 6, 2026
55 of 57 checks passed
@mudler mudler deleted the fix/pgvector-hybrid-search-10186 branch June 6, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Vector collection search causes seqscan on 2M+ rows, hitting 10s context timeout (HNSW index ignored)

2 participants