Conversation
|
It seems the CI failures are real. You could have missed something. |
|
@shyouhei Yes, I know that PR isn't ready to merge. But I can't seem to figure out these last issues. So before I spend a long time figuring the problem out, I'd rather check if the feature itself is acceptable. |
6fad863 to
69802c3
Compare
Actually scratch that. Not sure how but seems like the segfault I was tracking was solved by my rebase. I fixed the 3 legit spec & test failures. There is still some crash on Travis, but I failed to understand what it is about and wether it's legit or not. |
|
Actually the travis failure is the segfault I couldn't figure out: My theory is that the |
|
|
Thanks for the hints. I'll try to fix these issues. |
69802c3 to
d8717e0
Compare
|
I'm afraid I'm still stuck on the same issue. I'm not certain, but from my understanding, when the Regexp is freed, sometimes the I initially tried to prevent this with Since the Maybe a solution would be to record the hash value in Either way I don't think I can fix this issue myself without some help. |
|
We discussed how to implement it at dev-meeting with some other committers and we conclude it is better to have a cache system in onigumo layer with reference counting. Do you want to try it? Unfortunately, I'm very busy now so I can't help. |
That indeed sounds easier to implement, but I'm not convinced it would yield a better result from a Ruby user perspective.
Totally understandable. I'll give a few more tries at implementing either solution, but if anyone is reading this and feel they can implement it, feel free. |
95e525d to
faff237
Compare
Real world application contain many duplicated Regexp literals.
From a rails/console in Redmine:
```
>> ObjectSpace.each_object(Regexp).count
=> 6828
>> ObjectSpace.each_object(Regexp).uniq.count
=> 4162
>> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum
=> 4611957 # 4.4 MB total
>> ObjectSpace.each_object(Regexp).to_a.map { |r| ObjectSpace.memsize_of(r) }.sum - ObjectSpace.each_object(Regexp).to_a.uniq.map { |r| ObjectSpace.memsize_of(r) }.sum
=> 1490601 # 1.42 MB could be saved
```
Here's the to 10 duplicated regexps in Redmine:
```
147: /"/
107: /\s+/
103: //
89: /\n/
83: /'/
76: /\s+/m
37: /\d+/
35: /\[/
33: /./
33: /\\./
```
faff237 to
5473c1a
Compare
|
Kind of a personal braindump: I managed to reduce the crash a bit and have a nice feedback loop locally. After adding some debug output, the crash is caused by a Regexp instance that has:
What's weird is that this Regexp has the So I initially suspected it might be created via So I'm still digging here, I don't think there is a fundamental problem to the current implementation, however there's clearly some edge case I'm overlooking. |
1a747da to
7203ed4
Compare
Now the Travis was revived and only manages Arm (arm64, arm62), IBM (ppc64le, s390x) CPU architectures by the commit 9d4266f . You can rebase this PR based on the latest master to check if the Travis fails. |
Ruby ticket: https://bugs.ruby-lang.org/issues/16557
Context
Real world application contain many duplicated Regexp literals.
From a rails/console in Redmine:
Here's the to 10 duplicated regexps in Redmine:
Any empty Rails application will have a similar amount of regexps.
The feature
Since https://bugs.ruby-lang.org/issues/16377 made literal regexps frozen, it is possible to deduplicate literal regexps without changing any semantic.
This patch is heavily inspired by the
frozen_stringstable, but applied to literal regexps.Bugs
Unfortunately this patch segfault during GC on Linux, and I haven't managed to figure out why. Interestingly it doesn't segfault on OSX.