Skip to content

Conversation

@robstoll
Copy link
Owner

because if a Pattern (in java) uses BnMS then chars with two code points (high and low surrogate) match even if the start index points to the lower surrogate.

fixes #1884


I confirm that I have read the Contributor Agreements v1.0, agree to be bound on them and confirm that my contribution is compliant.

because if a Pattern (in java) uses BnMS then chars with two code points
(high and low surrogate) match even if the start index points to the
lower surrogate.
@robstoll robstoll force-pushed the bugfix/1884-regex-surrogate-code-points branch from 404cc83 to 82b1e45 Compare December 27, 2024 09:08
@codecov
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.80%. Comparing base (4296ef1) to head (82b1e45).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1886   +/-   ##
=========================================
  Coverage     91.80%   91.80%           
  Complexity      119      119           
=========================================
  Files           455      455           
  Lines          5014     5017    +3     
  Branches        239      240    +1     
=========================================
+ Hits           4603     4606    +3     
  Misses          364      364           
  Partials         47       47           
Flag Coverage Δ
current 91.42% <100.00%> (+<0.01%) ⬆️
current_windows 90.59% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robstoll robstoll merged commit 6a108dd into main Dec 27, 2024
28 checks passed
@robstoll robstoll deleted the bugfix/1884-regex-surrogate-code-points branch December 27, 2024 09:24
override fun search(searchIn: CharSequence, searchFor: Regex): Int {
var counter = 0
var matchResult = searchFor.find(searchIn)
val searchInString = searchIn.toString()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: does searchIn.findAll(searchFor): Sequence<MatchResult> work?
MatchResult.next(): MatchResult? work?

It looks a bit weird to hand-craft it

Copy link
Owner Author

@robstoll robstoll Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findAll would be a disjoint search, i.e. search aa in aaaa would find 2 occurrences. we use a non-disjoint search i.e. it finds it 3 times. Don't ask me why I went with a non-disjoint search tough. Looking at it now it seems a bit weird but back in 2017 I had a use case where it was relevant and it seemed natural to implement it that way. I guess it was an edge case as I cannot remember it. Most times it probably doesn't matter, so... I would keep it that way (only break it in a major version).

I wasn't aware of the MatchResult.next() method. I would hope it works. Will check it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatchResult.next() and friends go for disjoint matches which is what I would have expected if I used toContain o exactly 3 regex.

If you implement overlapping occurrences, then the hand-rolled impl is ok. I was just curios, there were not comments, so I asked.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disjoint matches which is what I would have expected if I used toContain o exactly 3 regex.

I get that as I had the same assumption some time ago. It's documented in the KDoc (also for toContain o exactly 3 regex) but it's not intuitive. I guess I will change the behaviour with 2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

toContain.exactly.matchFor doesn't work if BnMS Node is used in Pattern

3 participants