Fix not escaping special characters in search pattern#5938
Conversation
fixes JabRef#5892 * add method to get search pattern for searched words with escaped javascript regexp special characters (for search without regular expressions) * in preview viewer use search pattern with escaped javascript regexp special characters
| /* Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped if no regular expression search is enabled | ||
| * @param escapeSpecialCharsForJS whether to escape characters in wi for javascript regexp (escaping all special characters) or for java (using \Q and \E) | ||
| */ | ||
| private Optional<Pattern> joinWordsToPattern(boolean escapeSpecialCharsForJS) { |
There was a problem hiding this comment.
Better is to use an enum here, eg.. EscapMode, with java and javascript as values:
https://www.teamten.com/lawrence/programming/prefer-enums-over-booleans.html
Siedlerchr
left a comment
There was a problem hiding this comment.
Thanks for your contribution! In general the fix looks good to me, just some minor code improvements!
| public class SearchQuery implements SearchMatcher { | ||
|
|
||
| // regexp pattern for escaping special characters in javascript regex | ||
| public static final String JAVASCRIPT_ESCAPED_CHARS_PATTERN = "[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]"; |
There was a problem hiding this comment.
You can use Pattern.compile to gain some performace improvements here
| StringJoiner joiner = new StringJoiner(")|(", "(", ")"); | ||
| for (String word : words) { | ||
| joiner.add(regularExpression ? word : Pattern.quote(word)); | ||
| joiner.add(regularExpression ? word : (escapeSpecialCharsForJS ? word.replaceAll(JAVASCRIPT_ESCAPED_CHARS_PATTERN, "\\\\$0") : Pattern.quote(word))); |
There was a problem hiding this comment.
Please extract this to a regular if-else, it's easier to understand on the first look than this chained conditions
* use enum to specify special characters escape mode * use compiled regex pattern instead of string
Siedlerchr
left a comment
There was a problem hiding this comment.
Thanks! Looks good to me!
For external contributions we have the policy, that a second dev looks over your code before it's merged.
calixtus
left a comment
There was a problem hiding this comment.
Got two comments, everything else looks good to me.
A test would be great...
| if (regularExpression) { | ||
| joiner.add(word); | ||
| } | ||
| else { |
There was a problem hiding this comment.
Isn't checkstyle complaining about putting this in two lines? Has no effect, but looks somewhat odd...
There was a problem hiding this comment.
Yeah, it's a mistake, I'll change it. Also looks odd to me. Checkstyle wasn't complaining.
| } | ||
|
|
||
| // Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped for javascript if no regular expression search is enabled | ||
| public Optional<Pattern> getJsPatternForWords() { |
There was a problem hiding this comment.
IMHO, although "Js" is quite obvious, I would reword that to getJavaScriptPatternForWords, since you are creating a new Method just for the sole purpose to make its use more obvious in it's name. So why not go all the way?
| public static final Pattern JAVASCRIPT_ESCAPED_CHARS_PATTERN = Pattern.compile("[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]"); | ||
|
|
||
| /** | ||
| * Metod for escaping special characters in regular expressions |
Ok, I will write a test. |
| //first word contain all javascript special regex characters that should be escaped individually in text based search | ||
| String queryText = "([{\\\\^$|]})?*+./ word1 word2."; | ||
| SearchQuery textQueryWithSpecialChars = new SearchQuery(queryText, false, false); | ||
| String pattern = "(\\(\\[\\{\\\\\\^\\$\\|\\]\\}\\)\\?\\*\\+\\.\\/)|(word1)|(word2\\.)"; |
There was a problem hiding this comment.
I've created tests but I'm not sure about readbility of this approach.
There was a problem hiding this comment.
I know this looks totally crazy with those escaping, but it's fine.
Siedlerchr
left a comment
There was a problem hiding this comment.
Thanks again for creating the tests as well!
|
I think this is really great as well! Thank you! |
* upstream/master: followup fix Fixfetcher (#5948) Bump byte-buddy-parent from 1.10.7 to 1.10.8 (#5952) Added MenuButtons to IntegrityCheckDialog (#5955) Reimplement custom entry types dialog (#5799) Bump unirest-java from 3.4.03 to 3.5.00 (#5953) MySQL: Allow public key retrieval (#5909) Restructure and improve docs for setting up IntelliJ (#5960) Change syntax for Oracle multi-row insert SQL statement (#5837) Bump classgraph from 4.8.62 to 4.8.64 (#5954) Squashed 'src/main/resources/csl-styles/' changes from c531528..9e81857 Fix not escaping special characters in search pattern (#5938)
fixes #5892
The error when searching for "DOI 10.1210/endrev/bnz006" (or any phrases containing special javascript regular expression characters) is caused by not escaping special characters before passing search pattern to javascript script used for highlighting words in preview view.