Conversation
2aed5ac to
19a0d91
Compare
|
(cc potential reviewers: @damiendoligez @bobot @jhjourdan @stedolan @kayceesrk) |
|
Just to rephrase and complete to see if our understanding are in sync:
So:
I agree with the idea of the optimization, and in fact it seems that we could avoid the partial clean of |
runtime/caml/weak.h
Outdated
| hd = Hd_val (v); | ||
| size = Wosize_hd (hd); | ||
| for (i = 2; i < size; i++){ | ||
| for (i = offset; i < offset + count; i++){ |
There was a problem hiding this comment.
I would prefer to remove the + and -2 in caml_ephe_partial and caml_ephe_clean, by using after_last_offset instead of count in this function. But the name of the variable is bad so.
There was a problem hiding this comment.
That's reasonable. How about offset_start and offset_end? End of range being the position after the last element seems like an established enough convention in C.
There was a problem hiding this comment.
(pushed this change)
There was a problem hiding this comment.
Another nitpicking, could you add the ASSERT (EDIT: CAMLassert) corresponding to the requirement between 2 <= offset_start <= offset_end <= Wosize(Hd_val...) if it is true.
There was a problem hiding this comment.
Added that assertion. It's definitely better be true.
|
I agree on every point. I don't know why the clean phase takes much time (or what fraction of time it's supposed to take, anyway). Maybe in this benchmark it's extreme because most (all?) of the objects allocated on the major heap are ephemerons, but we've seen a noticeable performance hit in a real application where I think <5% of the heap consists of ephemerons. Anyway, whatever the proportion is, if it's constant we ended up with a quadratic cost of
That sounds mostly right. I don't know whether it's safe/desirable to call |
I think it is safe, because the only thing checked on the old key is if it is young. And a key can't be young and unclean.
The patch is a oneliner, but I have no experimental comparison. |
|
I pushed the optimization. It doesn't seem to affect the benchmark I have, presumably because the destination is always filled with nones and scanning those is very cheap (compared to caml_page_table_lookup). Still, cleaning the source array constitutes just over 10% of the cost of |
|
Wait, actually I don't agree with your safety argument. A key that's unclean and being overwritten is in fact guaranteed to visit the Given that it seems tricky to make a safety argument and the win is not great, I weakly prefer if we undo the optimization. What do you think? (or maybe I miscounted the number of negations :-D) |
|
Oh, maybe the answer is that this branch is visited in both cases (because |
|
The set is already done when |
|
Yeah, I do agree now. Thank you for clarifying. |
damiendoligez
left a comment
There was a problem hiding this comment.
Looks good to me. Approving on behalf of @bobot and myself.
3d8d8e9 to
5903aa0
Compare
|
I adjusted the changelog and collapsed the commits into two: one that introduces |
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
Faster Weak.blit
This is a fix for #9258.
This PR makes
caml_ephemeron_blit_keyfaster by no longer scanning the ephemeron keys that are not involved in the operation.Implementation details
We introduce
caml_ephe_clean_partial, a version ofcaml_ephe_cleanthat takes a range of keys to clean.One difference from
caml_ephe_cleanis that if we don't scan all the keys, we potentially end up withrelease_data = false. There are two potential concerns here:I claim that not releasing the data is OK because that's what happens when you call
caml_ephemeron_get_keyandcaml_ephemeron_set_key.I've adjusted the assertion accordingly.
Testing
The benchmark in #9258 takes ~0.1s with this patch instead of >10s without.
I have to admit that, even though I have run the test suite, I have not seen the (old version of) assertion fail. There might not be sufficient test coverage to hit this case.