Skip to content

Conversation

@cmb69
Copy link
Member

@cmb69 cmb69 commented Jul 12, 2021

If new_len > alloc_len, we had a serious issue anyway. The
comparison only makes some sense if new_len == alloc_len, but in that
case we don't need to re-allocate. Only if new_len == 0 we need to
allocate an empty zend_string. However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.

If `new_len > alloc_len`, we had a serious issue anyway.  The
comparison only makes some sense if `new_len == alloc_len`, but in that
case we don't need to *re*-allocate.  Only if `new_len == 0` we need to
allocate an empty zend_string.  However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.
@cmb69 cmb69 added the Bug label Jul 12, 2021
@cmb69
Copy link
Member Author

cmb69 commented Jul 12, 2021

Oh, I just noticed that php_pcre_replace_func_impl() has the same issue.

@nikic
Copy link
Member

nikic commented Jul 12, 2021

If new_len > alloc_len, we had a serious issue anyway. The
comparison only makes some sense if new_len == alloc_len, but in that
case we don't need to re-allocate. Only if new_len == 0 we need to
allocate an empty zend_string.

I don't really follow. Why would this only happen if new_len == alloc_len? This should happen whenever the last (non-replaced) part of the string becomes larger than the allocation.

However, trimming a potentially over-
allocated string appears to be reasonable, so we drop the condition
altogether.

Right. I wonder if we should leave some leeway though? It's not so important to shrink the string if it's already close enough in size. From a quick look I see size < alloc_size/2 and alloc_size-size > 16 being used as truncation criteria occasionally. Not sure how relevant this would be in practice though. Maybe just always reallocating is fine.

I also want to note that the reallocation scheme in preg_replace looks pretty broken. It does alloc_size = alloc_size + 2*new_len, which means that memory usage is approximately tripled on each reallocation, rather than the doubling that was probably intended. At the same time, this always builds up the allocation from zero, while it would probably be more reasonable to base it off the original string size.

@cmb69
Copy link
Member Author

cmb69 commented Jul 12, 2021

I don't really follow.

Sorry, that was plain nonsense.

From a quick look I see size < alloc_size/2 and alloc_size-size > 16 being used as truncation criteria occasionally.

I wouldn't be happy with the former, since that may still be quite wasteful. The latter seems to be okay, but might not be that relevant in practice.

It does alloc_size = alloc_size + 2*new_len, which means that memory usage is approximately tripled on each reallocation, rather than the doubling that was probably intended.

Right; that should be fixed.

At the same time, this always builds up the allocation from zero, while it would probably be more reasonable to base it off the original string size.

I'm not sure about that. What if the result is much shorter than the original string? It probably boils down to whether it's better to have more smaller allocations or to have fewer larger allocations. Maybe leave this for now?

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this for now.

@cmb69 cmb69 closed this in a6b4308 Jul 12, 2021
@cmb69 cmb69 deleted the cmb/81243 branch July 12, 2021 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants