Skip to content

Faster Weak.blit#9259

Merged
lpw25 merged 5 commits intoocaml:trunkfrom
aalekseyev:faster-Weak.blit
Feb 14, 2020
Merged

Faster Weak.blit#9259
lpw25 merged 5 commits intoocaml:trunkfrom
aalekseyev:faster-Weak.blit

Conversation

@aalekseyev
Copy link
Copy Markdown
Contributor

@aalekseyev aalekseyev commented Jan 23, 2020

This is a fix for #9258.

This PR makes caml_ephemeron_blit_key faster by no longer scanning the ephemeron keys that are not involved in the operation.

Implementation details

We introduce caml_ephe_clean_partial, a version of caml_ephe_clean that takes a range of keys to clean.

One difference from caml_ephe_clean is that if we don't scan all the keys, we potentially end up with release_data = false. There are two potential concerns here:

  • the assertion at the end of the function might fail
  • it's not obviously correct to not release the data

I claim that not releasing the data is OK because that's what happens when you call caml_ephemeron_get_key and caml_ephemeron_set_key.

I've adjusted the assertion accordingly.

Testing

The benchmark in #9258 takes ~0.1s with this patch instead of >10s without.

I have to admit that, even though I have run the test suite, I have not seen the (old version of) assertion fail. There might not be sufficient test coverage to hit this case.

@aalekseyev aalekseyev force-pushed the faster-Weak.blit branch 2 times, most recently from 2aed5ac to 19a0d91 Compare January 23, 2020 18:52
@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 23, 2020

(cc potential reviewers: @damiendoligez @bobot @jhjourdan @stedolan @kayceesrk)

@bobot
Copy link
Copy Markdown
Contributor

bobot commented Jan 24, 2020

Just to rephrase and complete to see if our understanding are in sync:

  • It is an optimization for the clean phase (I'm surprised that we spend so many time in this phase, do we know why?)
  • The clean phase is incremental, but we don't know which ephemerons have been cleaned (the dead values are removed)
  • The goal of the clean phase is that even if all the dead values are not yet removed from the ephemeron we want to do as if.
  • So each argument of blit could be uncleaned or already cleaned

So:

  • if dst (ard) has already been cleaned we should not add unclean keys: but the only keys added are the one blitted and they are checked by the new partial clean in src (ars).
  • if dst is unclean: we should not remove all the unclean keys without cleaning the data, the only keys removed in dst are checked in the new partial clean.

I agree with the idea of the optimization, and in fact it seems that we could avoid the partial clean of dst if the data of dst is caml_ephe_none (weak array case).

hd = Hd_val (v);
size = Wosize_hd (hd);
for (i = 2; i < size; i++){
for (i = offset; i < offset + count; i++){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to remove the + and -2 in caml_ephe_partial and caml_ephe_clean, by using after_last_offset instead of count in this function. But the name of the variable is bad so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. How about offset_start and offset_end? End of range being the position after the last element seems like an established enough convention in C.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pushed this change)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

@bobot bobot Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nitpicking, could you add the ASSERT (EDIT: CAMLassert) corresponding to the requirement between 2 <= offset_start <= offset_end <= Wosize(Hd_val...) if it is true.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that assertion. It's definitely better be true.

@aalekseyev
Copy link
Copy Markdown
Contributor Author

aalekseyev commented Jan 24, 2020

I agree on every point.

I don't know why the clean phase takes much time (or what fraction of time it's supposed to take, anyway). Maybe in this benchmark it's extreme because most (all?) of the objects allocated on the major heap are ephemerons, but we've seen a noticeable performance hit in a real application where I think <5% of the heap consists of ephemerons. Anyway, whatever the proportion is, if it's constant we ended up with a quadratic cost of blit elementwise blitting asymptotically.

it seems that we could avoid the partial clean of dst if the data of dst is caml_ephe_none

That sounds mostly right. I don't know whether it's safe/desirable to call do_set with an unclean destination, though. Do you think this is safe and worth doing?

@bobot
Copy link
Copy Markdown
Contributor

bobot commented Jan 24, 2020

Do you think this is safe and worth doing?

I think it is safe, because the only thing checked on the old key is if it is young. And a key can't be young and unclean.

Is it worth doing?

The patch is a oneliner, but I have no experimental comparison.

@aalekseyev
Copy link
Copy Markdown
Contributor Author

aalekseyev commented Jan 24, 2020

I pushed the optimization. It doesn't seem to affect the benchmark I have, presumably because the destination is always filled with nones and scanning those is very cheap (compared to caml_page_table_lookup).

Still, cleaning the source array constitutes just over 10% of the cost of Weak.blit (which itself, by the way, is only ~20% of the overall benchmark time now) so saving some cleaning could well be non-negligible in some cases.

@aalekseyev
Copy link
Copy Markdown
Contributor Author

aalekseyev commented Jan 24, 2020

Wait, actually I don't agree with your safety argument. A key that's unclean and being overwritten is in fact guaranteed to visit the if (!(Is_block (old) && Is_young (old))){ branch in do_set, and, therefore, add_to_ephe_ref_table, with consequences that are beyond my understanding.

Given that it seems tricky to make a safety argument and the win is not great, I weakly prefer if we undo the optimization.

What do you think? (or maybe I miscounted the number of negations :-D)

@aalekseyev
Copy link
Copy Markdown
Contributor Author

Oh, maybe the answer is that this branch is visited in both cases (because caml_ephe_none is not a young block either)?

@bobot
Copy link
Copy Markdown
Contributor

bobot commented Jan 24, 2020

The set is already done when add_to_ephe_ref_table (Caml_state->ephe_ref_table, ar, offset); is run, so old is not in the picture anymore (and in fact the key is not stored in the table). The value old is only given to Is_block and Is_young.

@aalekseyev
Copy link
Copy Markdown
Contributor Author

Yeah, I do agree now. Thank you for clarifying.

Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Approving on behalf of @bobot and myself.

@damiendoligez damiendoligez self-assigned this Jan 28, 2020
@aalekseyev
Copy link
Copy Markdown
Contributor Author

I adjusted the changelog and collapsed the commits into two: one that introduces caml_ephe_clean_partial and another that implements the optimization proposed by @bobot. Please let me know if I should do anything else before this PR can be merged.

@lpw25 lpw25 merged commit b807931 into ocaml:trunk Feb 14, 2020
stedolan pushed a commit to janestreet/ocaml that referenced this pull request Mar 17, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 16, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 21, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 21, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Jul 30, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 3, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 4, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 5, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 7, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 10, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 10, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 17, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 18, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 19, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 20, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Aug 28, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants