MOD-7253: Fix wildcard latency by raz-mon · Pull Request #4869 · RediSearch/RediSearch

raz-mon · 2024-07-17T06:20:57Z

Describe the changes in the pull request

This PR aspires to fix the latency issue we have with our wildcard and NOT iterators, in which the doc-id increment is done one-by-one, resulting in a lot of time wasted on looking for un-existing documents.
The problem is encountered when there are many writes and deletions, such that there are big "holes" in the docIds of the existing docs in the database. This caused us to waste time incrementing the docIds and checking whether their corresponding documents exist.
We opt to fix this by adding an "existing-docs" inverted-index, holding the doc-ids of all the currently existing docs (up to GC update), which the wildcard and NOT iterators can utilize to efficiently jump between large doc-id deltas instead of the single-increment jumps used currently.
We use the existing inverted-index API so no new API is introduced.

This inverted index is cleaned by the GC, just like most of the other inverted indexes we hold.

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

codecov · 2024-07-17T06:46:43Z

Codecov Report

Attention: Patch coverage is 77.48092% with 59 lines in your changes missing coverage. Please review.

Project coverage is 86.08%. Comparing base (6adbeea) to head (be5e1cf).
Report is 12 commits behind head on master.

Files with missing lines	Patch %	Lines
src/index.c	69.69%	30 Missing ⚠️
src/fork_gc.c	73.52%	27 Missing ⚠️
src/inverted_index.c	71.42%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4869      +/-   ##
==========================================
- Coverage   86.14%   86.08%   -0.07%     
==========================================
  Files         192      192              
  Lines       34256    34418     +162     
==========================================
+ Hits        29511    29629     +118     
- Misses       4745     4789      +44

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/fork_gc.c

GuyAv46

🦾

src/fork_gc.c

src/index.c

src/fork_gc.c

src/rules.c

kei-nan · 2024-09-08T07:00:41Z

src/info_command.c

    REPLY_KVSTR_SAFE("payload_field", rule->payload_field);
  }

+  if (rule->index_all) {


Maybe we should also output here the memory usage specifically for the new inverted index in case it was enabled?

I don't think we should add that here.
This may be added to the FT.INFO response, but I don't want it to create confusion with regarding the memory we use, since this memory is counted already in sp->stats.invertedSize

kei-nan · 2024-09-08T07:01:24Z

src/info_command.c

+  if (rule->index_all) {
+    REPLY_KVSTR_SAFE("indexes_all", "true");
+  } else {
+    REPLY_KVSTR_SAFE("indexes_all", "false");


maybe use the same strings you used for the input arguments.
i.e enable and disable
just to be consistent

Thing is that this needs to be lower case and separated by _ (to be consistent with the rest of the response), so this can become index_all to more resemble the creation argument format. Then it doesn't really matter anymore. But I can go with index_all..

src/index.c

MeirShpilraien

Looks good, few comments and some other general comments:

How do the tests verifies that the new feature is been used, they would have pass even with the old code right?
Lets improve the top comment with some additional info:
- Explanation of the problem we have today, at what scenarios it happened.
- How do we fix it.
- The new API that was added.

src/fork_gc.c

src/indexer.c

src/profile.c

Wildcard and NOT iterators latency fix

github-actions · 2024-09-09T16:07:57Z

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-4869-to-2.8 origin/2.8
cd .worktree/backport-4869-to-2.8
git switch --create backport-4869-to-2.8
git cherry-pick -x 428f8023e507dc9a9015a9d129093d473dd3d9f1

github-actions · 2024-09-09T16:08:00Z

Successfully created backport PR for 2.10:

[2.10] Fix wildcard latency (reverted) #5008

Wildcard and NOT iterators latency fix (cherry picked from commit 428f802)

MOD-7253: Fix wildcard latency (#4869) Wildcard and NOT iterators latency fix (cherry picked from commit 428f802) Co-authored-by: Raz Monsonego <74051729+raz-mon@users.noreply.github.com>

raz-mon commented Jul 17, 2024

View reviewed changes

src/fork_gc.c Show resolved Hide resolved

raz-mon force-pushed the razmon-fix_wildcard_latency branch 2 times, most recently from 8e64fe0 to 70fb1c2 Compare July 18, 2024 13:50

raz-mon added backport 2.8 backport 2.10 action:run-benchmark and removed action:run-benchmark labels Jul 29, 2024

raz-mon requested review from GuyAv46, MeirShpilraien, kei-nan and oshadmi September 5, 2024 10:59

GuyAv46 reviewed Sep 5, 2024

View reviewed changes

kei-nan reviewed Sep 8, 2024

View reviewed changes

src/fork_gc.c Show resolved Hide resolved

kei-nan reviewed Sep 8, 2024

View reviewed changes

src/rules.c Outdated Show resolved Hide resolved

kei-nan reviewed Sep 8, 2024

View reviewed changes

raz-mon requested a review from GuyAv46 September 8, 2024 07:47

GuyAv46 reviewed Sep 8, 2024

View reviewed changes

src/index.c Outdated Show resolved Hide resolved

raz-mon requested a review from GuyAv46 September 8, 2024 08:26

GuyAv46 previously approved these changes Sep 8, 2024

View reviewed changes

raz-mon dismissed GuyAv46’s stale review via 4e879a2 September 8, 2024 08:31

kei-nan force-pushed the master branch from 6adbeea to c1e17d9 Compare September 8, 2024 11:16

Wildcard and NOT iterators latency fix

be5e1cf

raz-mon force-pushed the razmon-fix_wildcard_latency branch from 4e879a2 to be5e1cf Compare September 8, 2024 14:11

MeirShpilraien reviewed Sep 8, 2024

View reviewed changes

src/fork_gc.c Show resolved Hide resolved

src/fork_gc.c Show resolved Hide resolved

src/indexer.c Show resolved Hide resolved

src/profile.c Show resolved Hide resolved

MeirShpilraien approved these changes Sep 9, 2024

View reviewed changes

raz-mon added this pull request to the merge queue Sep 9, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024

raz-mon added this pull request to the merge queue Sep 9, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024

raz-mon added this pull request to the merge queue Sep 9, 2024

github-merge-queue bot pushed a commit that referenced this pull request Sep 9, 2024

MOD-7253: Fix wildcard latency (#4869)

d643ac9

Wildcard and NOT iterators latency fix

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024

raz-mon added this pull request to the merge queue Sep 9, 2024

Merged via the queue into master with commit 428f802 Sep 9, 2024

raz-mon deleted the razmon-fix_wildcard_latency branch September 9, 2024 16:07

github-actions bot pushed a commit that referenced this pull request Sep 9, 2024

MOD-7253: Fix wildcard latency (#4869)

5748f22

Wildcard and NOT iterators latency fix (cherry picked from commit 428f802)

github-actions bot mentioned this pull request Sep 9, 2024

[2.10] Fix wildcard latency (reverted) #5008

Merged

raz-mon removed backport 2.8 backport 2.10 labels Sep 10, 2024

This was referenced Sep 10, 2024

CP missing indexing memory reporting fix #5012

Merged

[8.0] MOD-7748: Add existing-indexing unit-test #5024

Merged

raz-mon mentioned this pull request Nov 14, 2024

[BUG] query with ft.search on index that frequently update documents is slow #5212

Closed

GuyAv46 mentioned this pull request Dec 4, 2024

[BUG] ft.aggregate slowdown with high frequency updates #4508

Closed

raz-mon mentioned this pull request Dec 19, 2024

MOD-8192: Optimize OPTIONAL iterator with existing-index #5386

Merged

kei-nan mentioned this pull request Apr 3, 2025

[BUG] Redisearch slow down after restarting Redis #5861

Closed

Conversation

raz-mon commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

GuyAv46 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kei-nan Sep 8, 2024

Choose a reason for hiding this comment

Uh oh!

raz-mon Sep 8, 2024

Choose a reason for hiding this comment

Uh oh!

kei-nan Sep 8, 2024

Choose a reason for hiding this comment

Uh oh!

raz-mon Sep 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MeirShpilraien left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2024

Uh oh!

github-actions bot commented Sep 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

raz-mon commented Jul 17, 2024 •

edited

Loading

codecov bot commented Jul 17, 2024 •

edited

Loading

raz-mon Sep 8, 2024 •

edited

Loading