Skip to content

Conversation

@RobertGlobant20
Copy link
Contributor

Purpose

Fixing missing nodes in Lucene nodes search
Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term.
Also I've added a Unit Test that validates if a set of specific nodes are found or not.

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated
  • This PR contains no files larger than 50 MB

Release Notes

Fixing missing nodes in Lucene nodes search

Reviewers

@QilongTang @reddyashish

FYIs

Due that we were using a StandardAnalyzer for Lucene searching  we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term.
Also I've added a Unit Test that validates if a set of specific nodes are found or not.
@RobertGlobant20
Copy link
Contributor Author

Screenshot showing the result before my fix and after my fix:
image

@QilongTang QilongTang added this to the 2.19.0 milestone Aug 14, 2023
return new RussianAnalyzer(LuceneConfig.LuceneNetVersion);
default:
return new StandardAnalyzer(LuceneConfig.LuceneNetVersion);
return new LuceneCustomAnalyzer(LuceneConfig.LuceneNetVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @RobertGlobant20 Why do we apply LuceneCustomAnalyzer only on certain cases? How about other languages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is happening due that we are using the StandarAnalyzer (this use english by default), that removes some the english words like "a", "an", "and", "are", "as", "at", "be", "but", "by", "And", "Or", "If" and also some specific symbols like "+", "*", so when we execute a query those words/symbols are removed (or escaped) and the node won't listed in the results.

Not sure if for other languages this change will apply or not but we can test if is needed (e.g. if the language is Chinese Simplified probably will remove “并且”、“或者”、“如果” and I guess that the nodes "And", "Or", "If" are not translated (do they?) so there is not problem but we need to check with "+", "-", "*" nodes.

For doing this test we need to switch to Dynamo Chinese language and then create a package with several nodes in Chinese and then search the nodes "+", "And", "*" and also for the nodes in "Chinese", I will test this case tomorrow to see the behavior.

Let me know your thoughts about it.
Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, make sense

@QilongTang QilongTang merged commit 06beb16 into DynamoDS:master Aug 15, 2023
@QilongTang
Copy link
Contributor

@RobertGlobant20 Would you cherry-pick this?

QilongTang pushed a commit that referenced this pull request Aug 15, 2023
Due that we were using a StandardAnalyzer for Lucene searching  we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term.
Also I've added a Unit Test that validates if a set of specific nodes are found or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants