Skip to content

lb-rs: fix duplicate search results#4062

Merged
smailbarkouch merged 4 commits intomasterfrom
smail/fix-duplicate-search-results
Dec 28, 2025
Merged

lb-rs: fix duplicate search results#4062
smailbarkouch merged 4 commits intomasterfrom
smail/fix-duplicate-search-results

Conversation

@smailbarkouch
Copy link
Copy Markdown
Contributor

@smailbarkouch smailbarkouch commented Dec 27, 2025

Did a few things

  1. Added id_str_raw with STRING type - will be used to remove deleted docs/ids
  2. Updated the Event::MetadataChanged callback to pass the deleted ids to the delete field of update_tantivy
  3. Updated build_index to compute deleted

fixes #4061

@smailbarkouch smailbarkouch requested a review from Parth December 27, 2025 22:38
@smailbarkouch smailbarkouch requested a review from ad-tra December 27, 2025 22:41
@Parth
Copy link
Copy Markdown
Member

Parth commented Dec 28, 2025

Updated id_str from TEXT to STRING - did this because we don't want to tokenize it

I do want to be able to search via ID

let deleted_ids = replacement_index.compute_deleted(&current_index);
*lb.search.metadata_index.write().await = replacement_index;
lb.update_tantivy(vec![], deleted_ids).await;
lb.update_tantivy(deleted_ids, vec![]).await;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a crazy mistake for me to have made, nice catch

@Parth
Copy link
Copy Markdown
Member

Parth commented Dec 28, 2025

Updated build_index to compute deleted
I don't know if this change is totally needed just because this only happens on startup. There are some obscure situations where we call build_index like the CLI, but given the nature of how that gets called it's not needed there either.

I would remove, but I would not mind if you do not remove, could go either way.

Copy link
Copy Markdown
Member

@Parth Parth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, #4062 (comment) is the only thing I care about

@smailbarkouch
Copy link
Copy Markdown
Contributor Author

smailbarkouch commented Dec 28, 2025

Updated build_index to compute deleted
I don't know if this change is totally needed just because this only happens on startup. There are some obscure situations where we call build_index like the CLI, but given the nature of how that gets called it's not needed there either.

I would remove, but I would not mind if you do not remove, could go either way.

Agreed not totally needed - leaning towards leaving it since I would assume build_index removes deleted ids

@smailbarkouch
Copy link
Copy Markdown
Contributor Author

Updated id_str from TEXT to STRING - did this because we don't want to tokenize it

I do want to be able to search via ID

Using TEXT makes it impossible (?) to remove the term by ID, since delete_term performs an ID lookup and no ID exists as a standalone entry

@Parth
Copy link
Copy Markdown
Member

Parth commented Dec 28, 2025

The old code has this:

        for id in delete {
            let term = Term::from_field_text(id_str, &id.to_string());
            index_writer.delete_term(term);
        }

so does this not delete the whole entry? It just clears out that field?

I think having both id as text & string is prob fine, it's small. But yeah checkout the docs.

@smailbarkouch
Copy link
Copy Markdown
Contributor Author

so does this not delete the whole entry? It just clears out that field?

It doesn't delete anything since id_str is tokenized and it fails to find it

@Parth
Copy link
Copy Markdown
Member

Parth commented Dec 28, 2025

I see, unfortunate API design, sounds good best of luck

@smailbarkouch smailbarkouch requested a review from Parth December 28, 2025 21:55
@smailbarkouch smailbarkouch merged commit 41027d0 into master Dec 28, 2025
1 check passed
@smailbarkouch smailbarkouch deleted the smail/fix-duplicate-search-results branch December 28, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

duplicate search results after file updates or deletions

2 participants