Skip to content

[MOD-8125] Create inverted index in write operations only#5229

Merged
meiravgri merged 15 commits intomasterfrom
meiravg_open_inv_index_for_read
Nov 25, 2024
Merged

[MOD-8125] Create inverted index in write operations only#5229
meiravgri merged 15 commits intomasterfrom
meiravg_open_inv_index_for_read

Conversation

@meiravgri
Copy link
Copy Markdown
Collaborator

@meiravgri meiravgri commented Nov 19, 2024

This PR ensures we create an inverted index only in write operations (when possible. see exceptions below)

Read operations

  • ft.info
  • INFO
  • queries
  • GC

New in this PR

FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

Returnes

  • inverted_indexes_dict_size
  • inverted_indexes_memory

Exceptions

Some read operations still create the index to avoid breaking changes, or where a fundamental change was required:

debug commands

  • VecsimInfo and dumpHNSWData we need an index to create an info iterator.
  • DumpGeometryIndex calls abstract api->dump(idx, ctx). To allow avoiding from creating the index, we need a dump function that can imitate an empty index for idx == NULL.

Queries

  • Query_EvalGeometryNode: since the query string is validated after we are trying to open the geometry index, in case of a wrong query string and uninitialized index, we will return results as if the index was empty, instead of raising an error. However, If we create the index (as we are doing today), we can proceed to query validation.
    So until we change the order of verification, we open the index with CREATE option.

@meiravgri meiravgri force-pushed the meiravg_open_inv_index_for_read branch from d1fff41 to cc57b33 Compare November 19, 2024 14:50
@codecov
Copy link
Copy Markdown

codecov bot commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 98.27586% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.68%. Comparing base (d67d645) to head (0c64a3c).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/fork_gc.c 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5229      +/-   ##
==========================================
+ Coverage   86.64%   86.68%   +0.04%     
==========================================
  Files         193      193              
  Lines       34698    34710      +12     
==========================================
+ Hits        30063    30089      +26     
+ Misses       4635     4621      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@meiravgri meiravgri changed the title Meiravg_open_inv_index_for_readgi [MOD-8125] Create index in write operations only Nov 21, 2024
@meiravgri meiravgri changed the title [MOD-8125] Create index in write operations only [MOD-8125] Create inverted index in write operations only Nov 21, 2024
@meiravgri meiravgri requested a review from alonre24 November 21, 2024 13:33
return False
except KeyError:
if _assert:
env.assertTrue(False, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env.assertTrue(False, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}")
env.assertContains(key, d2, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertContains doesn't support message :(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix it

d3 = d1.copy()
d1["rabbit"] = 5
try:
compare_numeric_dicts(env, d1, d3, _assert=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is _assert=False causing an exception aligned with the RLTest behavior?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by the function as a hint to avoid assertions (to test expected failures)

Copy link
Copy Markdown
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! So except for vector and geometry, were there any other places where we created an index unnecessarily (except for the tag in the GC)?

RedisSearchCtx_LockSpecWrite(sctx);

InvertedIndex *idx = Redis_OpenInvertedIndex(sctx, term, len, 1, NULL);
InvertedIndex *idx = Redis_OpenInvertedIndex(sctx, term, len, DONT_CREATE_INDEX, NULL);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand why it was explicitly opened for writing here? Also, I didn't see a test validating the expected behavior with respect to GC...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. But if we got here the term existed before we forked
Regarding gc - don't we have unit tests for text?

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments
GuyAv46
GuyAv46 previously approved these changes Nov 24, 2024

// TODO: open with DONT_CREATE_INDEX once the query string is validated before we get here.
// Currently, if we use DONT_CREATE_INDEX, and the index was not initalized yet, and the query is invalid,
// we return results as if the index was empty, instead of raising an error.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be done here? Parse the query before getting here? Fix something in our parser? Can we do something in this PR?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. not in the scope of this PR
I'm opening a ticket

@meiravgri meiravgri added this pull request to the merge queue Nov 25, 2024
Merged via the queue into master with commit 7617563 Nov 25, 2024
@meiravgri meiravgri deleted the meiravg_open_inv_index_for_read branch November 25, 2024 09:41
@redisearch-backport-pull-request
Copy link
Copy Markdown
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-5229-to-2.8 origin/2.8
cd .worktree/backport-5229-to-2.8
git switch --create backport-5229-to-2.8
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef

@redisearch-backport-pull-request
Copy link
Copy Markdown
Contributor

Backport failed for 2.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.6
git worktree add -d .worktree/backport-5229-to-2.6 origin/2.6
cd .worktree/backport-5229-to-2.6
git switch --create backport-5229-to-2.6
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef

@redisearch-backport-pull-request
Copy link
Copy Markdown
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-5229-to-2.10 origin/2.10
cd .worktree/backport-5229-to-2.10
git switch --create backport-5229-to-2.10
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef

@redisearch-backport-pull-request
Copy link
Copy Markdown
Contributor

Backport failed for 8.0, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.0
git worktree add -d .worktree/backport-5229-to-8.0 origin/8.0
cd .worktree/backport-5229-to-8.0
git switch --create backport-5229-to-8.0
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef

meiravgri added a commit that referenced this pull request Nov 26, 2024
* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)
github-merge-queue bot pushed a commit that referenced this pull request Nov 26, 2024
* [MOD-8125] Create inverted index in write operations only (#5229)

* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)

* fix debug commands test
meiravgri added a commit that referenced this pull request Nov 26, 2024
* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)
meiravgri added a commit that referenced this pull request Nov 26, 2024
* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)
@meiravgri
Copy link
Copy Markdown
Collaborator Author

won't be cherry-picked to 2.6 due to discrepancy in open index functions signature

github-merge-queue bot pushed a commit that referenced this pull request Nov 27, 2024
[MOD-8125] Create inverted index in write operations only (#5229)

* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)
github-merge-queue bot pushed a commit that referenced this pull request Nov 28, 2024
* [MOD-8125] Create inverted index in write operations only (#5229)

* open vector for read

* don't create vector index  in read mode

* use  OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags

* remove unused openindex function

* create vecsim index to capture info in debug commands

add create_if_missing variable to OpenGeometryIndex

 dont create text index in gc if doesn't exist

* use DONT_CREATE_INDEX in NewVectorIterator

* introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME

introduce test_index to test lazy intialziation

* fix

* cosmetics

* skip testSpecIndexesInfo in coord

* rename openVectorKeysDict to openVectorIndex

add depth arg to pytest callbacks

add test_lazy_index_creation_info_modules

some tests improvments

* fix field name in test

* pass

(cherry picked from commit 7617563)

* fix include

* fix number of expected args in new debug command

remove DUMP_HNSW from test_lazy_index_creation_debug_commands as not introduced to this version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants