[MOD-8125] Create inverted index in write operations only#5229
[MOD-8125] Create inverted index in write operations only#5229
Conversation
d1fff41 to
cc57b33
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5229 +/- ##
==========================================
+ Coverage 86.64% 86.68% +0.04%
==========================================
Files 193 193
Lines 34698 34710 +12
==========================================
+ Hits 30063 30089 +26
+ Misses 4635 4621 -14 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist
introduce test_index to test lazy intialziation
| return False | ||
| except KeyError: | ||
| if _assert: | ||
| env.assertTrue(False, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}") |
There was a problem hiding this comment.
| env.assertTrue(False, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}") | |
| env.assertContains(key, d2, message=msg + f" key {key} exists in {d1_name} but doesn't exist in {d2_name}") |
There was a problem hiding this comment.
assertContains doesn't support message :(
| d3 = d1.copy() | ||
| d1["rabbit"] = 5 | ||
| try: | ||
| compare_numeric_dicts(env, d1, d3, _assert=False) |
There was a problem hiding this comment.
Is _assert=False causing an exception aligned with the RLTest behavior?
There was a problem hiding this comment.
It is used by the function as a hint to avoid assertions (to test expected failures)
alonre24
left a comment
There was a problem hiding this comment.
Good catch! So except for vector and geometry, were there any other places where we created an index unnecessarily (except for the tag in the GC)?
| RedisSearchCtx_LockSpecWrite(sctx); | ||
|
|
||
| InvertedIndex *idx = Redis_OpenInvertedIndex(sctx, term, len, 1, NULL); | ||
| InvertedIndex *idx = Redis_OpenInvertedIndex(sctx, term, len, DONT_CREATE_INDEX, NULL); |
There was a problem hiding this comment.
Do we understand why it was explicitly opened for writing here? Also, I didn't see a test validating the expected behavior with respect to GC...
There was a problem hiding this comment.
No. But if we got here the term existed before we forked
Regarding gc - don't we have unit tests for text?
add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments
|
|
||
| // TODO: open with DONT_CREATE_INDEX once the query string is validated before we get here. | ||
| // Currently, if we use DONT_CREATE_INDEX, and the index was not initalized yet, and the query is invalid, | ||
| // we return results as if the index was empty, instead of raising an error. |
There was a problem hiding this comment.
What should be done here? Parse the query before getting here? Fix something in our parser? Can we do something in this PR?
There was a problem hiding this comment.
Yes. not in the scope of this PR
I'm opening a ticket
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.8
git worktree add -d .worktree/backport-5229-to-2.8 origin/2.8
cd .worktree/backport-5229-to-2.8
git switch --create backport-5229-to-2.8
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.6
git worktree add -d .worktree/backport-5229-to-2.6 origin/2.6
cd .worktree/backport-5229-to-2.6
git switch --create backport-5229-to-2.6
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.10
git worktree add -d .worktree/backport-5229-to-2.10 origin/2.10
cd .worktree/backport-5229-to-2.10
git switch --create backport-5229-to-2.10
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 8.0
git worktree add -d .worktree/backport-5229-to-8.0 origin/8.0
cd .worktree/backport-5229-to-8.0
git switch --create backport-5229-to-8.0
git cherry-pick -x 7617563a752f9e2ddc49d57e4c2d5b3397be67ef |
* open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563)
* [MOD-8125] Create inverted index in write operations only (#5229) * open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563) * fix debug commands test
* open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563)
* open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563)
|
won't be cherry-picked to 2.6 due to discrepancy in open index functions signature |
[MOD-8125] Create inverted index in write operations only (#5229) * open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563)
* [MOD-8125] Create inverted index in write operations only (#5229) * open vector for read * don't create vector index in read mode * use OPEN_INDEX_READ or OPEN_INDEX_WRITE in tags * remove unused openindex function * create vecsim index to capture info in debug commands add create_if_missing variable to OpenGeometryIndex dont create text index in gc if doesn't exist * use DONT_CREATE_INDEX in NewVectorIterator * introduce FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAME introduce test_index to test lazy intialziation * fix * cosmetics * skip testSpecIndexesInfo in coord * rename openVectorKeysDict to openVectorIndex add depth arg to pytest callbacks add test_lazy_index_creation_info_modules some tests improvments * fix field name in test * pass (cherry picked from commit 7617563) * fix include * fix number of expected args in new debug command remove DUMP_HNSW from test_lazy_index_creation_debug_commands as not introduced to this version
This PR ensures we create an inverted index only in write operations (when possible. see exceptions below)
Read operations
New in this PR
FT.DEBUG SPEC_INVIDXES_INFO INDEX_NAMEReturnes
inverted_indexes_dict_sizeinverted_indexes_memoryExceptions
Some read operations still create the index to avoid breaking changes, or where a fundamental change was required:
debug commands
VecsimInfoanddumpHNSWDatawe need an index to create an info iterator.DumpGeometryIndexcalls abstractapi->dump(idx, ctx). To allow avoiding from creating the index, we need a dump function that can imitate an empty index foridx == NULL.Queries
Query_EvalGeometryNode: since the query string is validated after we are trying to open the geometry index, in case of a wrong query string and uninitialized index, we will return results as if the index was empty, instead of raising an error. However, If we create the index (as we are doing today), we can proceed to query validation.So until we change the order of verification, we open the index with
CREATEoption.