Skip to content

Conversation

@sjamesr
Copy link
Contributor

@sjamesr sjamesr commented May 31, 2021

Omit leading empty matches from Pattern.split, improve performance
Fixes #131.

This change modifies Pattern.split to omit a leading empty match. This
behavior was specified in JDK8 and brings RE2/J split into line with
more recent JDK implementations.

Furthermore, the split function no longer needs determine the number of
matches before assembling the result. The upshot is that the number of
find() calls is halved in many cases. The benchmark in the previous
change shows a significant improvement.

Reference impl (JDK):
BenchmarkSplit.benchmarkSplit     JDK  avgt    5  14.217 ± 0.410  us/op

RE2J (before):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  95.807 ± 6.737  us/op

RE2J (after):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  49.092 ± 0.717  us/op

@google-cla google-cla bot added the cla: yes label May 31, 2021
@sjamesr sjamesr force-pushed the improve_split branch 2 times, most recently from 79203fc to 354f528 Compare May 31, 2021 07:50
@sjamesr sjamesr force-pushed the improve_split branch 4 times, most recently from 6e16977 to c1aec38 Compare June 2, 2021 04:05
@codecov-commenter
Copy link

codecov-commenter commented Jun 2, 2021

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.19%. Comparing base (5c06a9e) to head (bf0cf09).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
java/com/google/re2j/Pattern.java 95.83% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #138      +/-   ##
==========================================
- Coverage   89.21%   89.19%   -0.03%     
==========================================
  Files          19       19              
  Lines        3024     3026       +2     
  Branches      609      612       +3     
==========================================
+ Hits         2698     2699       +1     
  Misses        187      187              
- Partials      139      140       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sjamesr sjamesr force-pushed the improve_split branch 3 times, most recently from b624ac6 to 1ab286a Compare June 6, 2021 14:09
sjamesr added 2 commits June 27, 2022 07:43
Fixes google#131.

This change modifies Pattern.split to omit a leading empty match. This
behavior was specified in JDK8 and brings RE2/J split into line with
more recent JDK implementations.

Furthermore, the split function no longer needs determine the number of
matches before assembling the result. The upshot is that the number of
find() calls is halved in many cases. The benchmark in the previous
change shows a significant improvement.

Reference impl (JDK):
BenchmarkSplit.benchmarkSplit     JDK  avgt    5  14.217 ± 0.410  us/op

RE2J (before):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  95.807 ± 6.737  us/op

RE2J (after):
BenchmarkSplit.benchmarkSplit    RE2J  avgt    5  49.092 ± 0.717  us/op
@sjamesr sjamesr merged commit 7bf197f into google:master Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect behavior for Pattern::split

2 participants