Skip to content

MOD-7253: Fix wildcard latency#4869

Merged
raz-mon merged 1 commit intomasterfrom
razmon-fix_wildcard_latency
Sep 9, 2024
Merged

MOD-7253: Fix wildcard latency#4869
raz-mon merged 1 commit intomasterfrom
razmon-fix_wildcard_latency

Conversation

@raz-mon
Copy link
Copy Markdown
Collaborator

@raz-mon raz-mon commented Jul 17, 2024

Describe the changes in the pull request

This PR aspires to fix the latency issue we have with our wildcard and NOT iterators, in which the doc-id increment is done one-by-one, resulting in a lot of time wasted on looking for un-existing documents.
The problem is encountered when there are many writes and deletions, such that there are big "holes" in the docIds of the existing docs in the database. This caused us to waste time incrementing the docIds and checking whether their corresponding documents exist.
We opt to fix this by adding an "existing-docs" inverted-index, holding the doc-ids of all the currently existing docs (up to GC update), which the wildcard and NOT iterators can utilize to efficiently jump between large doc-id deltas instead of the single-increment jumps used currently.
We use the existing inverted-index API so no new API is introduced.

This inverted index is cleaned by the GC, just like most of the other inverted indexes we hold.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 77.48092% with 59 lines in your changes missing coverage. Please review.

Project coverage is 86.08%. Comparing base (6adbeea) to head (be5e1cf).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/index.c 69.69% 30 Missing ⚠️
src/fork_gc.c 73.52% 27 Missing ⚠️
src/inverted_index.c 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4869      +/-   ##
==========================================
- Coverage   86.14%   86.08%   -0.07%     
==========================================
  Files         192      192              
  Lines       34256    34418     +162     
==========================================
+ Hits        29511    29629     +118     
- Misses       4745     4789      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🦾

REPLY_KVSTR_SAFE("payload_field", rule->payload_field);
}

if (rule->index_all) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also output here the memory usage specifically for the new inverted index in case it was enabled?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add that here.
This may be added to the FT.INFO response, but I don't want it to create confusion with regarding the memory we use, since this memory is counted already in sp->stats.invertedSize

if (rule->index_all) {
REPLY_KVSTR_SAFE("indexes_all", "true");
} else {
REPLY_KVSTR_SAFE("indexes_all", "false");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the same strings you used for the input arguments.
i.e enable and disable
just to be consistent

Copy link
Copy Markdown
Collaborator Author

@raz-mon raz-mon Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is that this needs to be lower case and separated by _ (to be consistent with the rest of the response), so this can become index_all to more resemble the creation argument format. Then it doesn't really matter anymore. But I can go with index_all..

@raz-mon raz-mon requested a review from GuyAv46 September 8, 2024 07:47
@raz-mon raz-mon requested a review from GuyAv46 September 8, 2024 08:26
GuyAv46
GuyAv46 previously approved these changes Sep 8, 2024
@raz-mon raz-mon force-pushed the razmon-fix_wildcard_latency branch from 4e879a2 to be5e1cf Compare September 8, 2024 14:11
Copy link
Copy Markdown

@MeirShpilraien MeirShpilraien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few comments and some other general comments:

  1. How do the tests verifies that the new feature is been used, they would have pass even with the old code right?
  2. Lets improve the top comment with some additional info:
    • Explanation of the problem we have today, at what scenarios it happened.
    • How do we fix it.
    • The new API that was added.

@raz-mon raz-mon added this pull request to the merge queue Sep 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024
@raz-mon raz-mon added this pull request to the merge queue Sep 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024
@raz-mon raz-mon added this pull request to the merge queue Sep 9, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 9, 2024
Wildcard and NOT iterators latency fix
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 9, 2024
@raz-mon raz-mon added this pull request to the merge queue Sep 9, 2024
Merged via the queue into master with commit 428f802 Sep 9, 2024
@raz-mon raz-mon deleted the razmon-fix_wildcard_latency branch September 9, 2024 16:07
@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 9, 2024

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-4869-to-2.8 origin/2.8
cd .worktree/backport-4869-to-2.8
git switch --create backport-4869-to-2.8
git cherry-pick -x 428f8023e507dc9a9015a9d129093d473dd3d9f1

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 9, 2024

Successfully created backport PR for 2.10:

github-actions bot pushed a commit that referenced this pull request Sep 9, 2024
Wildcard and NOT iterators latency fix

(cherry picked from commit 428f802)
github-merge-queue bot pushed a commit that referenced this pull request Sep 9, 2024
MOD-7253: Fix wildcard latency (#4869)

Wildcard and NOT iterators latency fix

(cherry picked from commit 428f802)

Co-authored-by: Raz Monsonego <74051729+raz-mon@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants