Skip to content

Optimize reallocation of caml_frame_descriptors.#156

Closed
chambart wants to merge 5 commits intoocaml:trunkfrom
chambart:optimizes_caml_init_frame_descriptor
Closed

Optimize reallocation of caml_frame_descriptors.#156
chambart wants to merge 5 commits intoocaml:trunkfrom
chambart:optimizes_caml_init_frame_descriptor

Conversation

@chambart
Copy link
Copy Markdown
Contributor

It now only reallocate and reinitialise the whole table only when it
is too small. This avoids quadratic behavior when loading a lot of
module with dynlink.

This in problematic on frama-c when inlining increase the code size. The frame table
initialisation took ~0.5 second. This is quite noticeable on real examples where the
whole frama-c analysis is ~1.5s long.

@milanst
Copy link
Copy Markdown

milanst commented Mar 23, 2015

+1 for fixing this issue

I've noticed the bad behavior when I was debugging a surprisingly long startup time of one of our programs at Jane Street (although this particular thing wasn't the main culprit).

The caml_frame_descriptors table can be really big when you use big libraries (like Core or Async, they have many functions). Rehashing this whole table every time a small module is dynlinked can be quite expensive, not to mention quadratic behavior when you load many modules.

@bobot
Copy link
Copy Markdown
Contributor

bobot commented Mar 23, 2015

That's impressive! We remarked in frama-c that the statup was slow but we never imagined that was because of this. Thanks a lot for finding it and for fixing it!

I tried to review the code, but I had trouble to understand the different states of the book keeping of the hashtable. So here an attempt to explicit the different states: bobot/ocaml@878819d, bobot/ocaml@e65f499 (branch optimizes_caml_init_frame_descriptor) . Feel free to use them (or not).

Except that, the code is neat and a lot more efficient!

@xavierleroy
Copy link
Copy Markdown
Contributor

Yes, it's a good idea to grow the hash table only when needed.

I wonder, however, if the code couldn't be simplified further by having caml_register_frametable enter the frames directly in the hash table,, rather than maintaining a list of frame tables. The resizing and insertion logic would go into caml_register_frametable. In the resize case, the contents of the old hash table are reinserted in the new, bigger one.

Also, it would be nice to be able to unregister a frametable, removing its entries from the hash table. We can't use this (safely) for natdynlink (yet?), but the Coqonut project (a JIT for Coq) could make use of it.

@damiendoligez
Copy link
Copy Markdown
Member

I wonder, however, if the code couldn't be simplified further by having caml_register_frametable enter the frames directly in the hash table

The difference is startup time for very-short-lived programs that don't allocate enough to start a major collection before they exit.

Not sure we should care about those.

@xavierleroy
Copy link
Copy Markdown
Contributor

Even minor collections need the frame table to be properly set up. So,
you'd need really very short-lived programs for the current "lazy" strategy
to win against the simpler approach I outlined.

2015-03-24 20:55 GMT+01:00 Damien Doligez notifications@github.com:

I wonder, however, if the code couldn't be simplified further by having
caml_register_frametable enter the frames directly in the hash table

The difference is startup time for very-short-lived programs that don't
allocate enough to start a major collection before they exit.

Not sure we should care about those.


Reply to this email directly or view it on GitHub
#156 (comment).

@chambart
Copy link
Copy Markdown
Contributor Author

@xavierleroy I'll try that simplification. Maybe one way to have both the lazy behavior for non-allocating programs and a simpler situation would be to continue to do the first initialisation lazily but directly update the table on dynlink. For code unloading, it's probably possible to lazily reconstruct the table when the used part of the table is below 1/8 (to avoid reallocating if the size oscillate around a power of 2). But I wouldn't want to try that if there is no code using it yet.

@damiendoligez
Copy link
Copy Markdown
Member

I don't think you need to bother with shrinking the table when unloading code, but don't you need to rebuild the table whenever you remove some entries?

And I don't think we really care about lazy initialization.

It now only reallocate and reinitialise the whole table only when it
is too small. This avoids quadratic behavior when loading a lot of
module with dynlink.
If some code was unloaded, the frame table can be freed, so we cannot
dereference the frame_descr pointer in the table. To avoid that
problem, the return address is copied to the frame table.
@chambart chambart force-pushed the optimizes_caml_init_frame_descriptor branch from 394f90d to 138bd93 Compare March 25, 2015 15:30
@chambart
Copy link
Copy Markdown
Contributor Author

I got rid of the lazy initialization and couldn't resist trying to allow unloading.

@chambart
Copy link
Copy Markdown
Contributor Author

To allow unloading I needed to store the return addresses directly in the table and to add a check in the insertion. The check is probably almost free, but this increase the size of the table. Note that it also means that collecting the local gc roots may be a bit cheaper.

@xavierleroy
Copy link
Copy Markdown
Contributor

No, we don't want to double the size of this hashtable. Think of the caches...

I'm not sure why you think you need to put the retaddr in the hash table. Deletion in a linearly-hashed table is painful but can be done without extra info, see e.g. TAOCP vol 3 section 6.4 algorithm R.

@chambart
Copy link
Copy Markdown
Contributor Author

It's bigger, but that also means that we don't need to follow some pointers when we walk the stack. That's probably avoid trashing parts of the cache with some useless frame descriptors. Out of curiosity, I benchmark it a bit and it seems to be consistently slightly faster (less than 5%) on exception or gc heavy workload.

But I don't really propose that part for inclusion, only the first part.

@damiendoligez
Copy link
Copy Markdown
Member

It's not just the caches: in the case that @milanst investigated, the table's size was 16M and most Jane Street programs are probably at this order of magnitude. You don't want to double that if you can avoid it.

@chambart
Copy link
Copy Markdown
Contributor Author

Removal was not tested at all.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 12, 2015

What is the status of this patch? My understanding is that @xavierleroy 's request for unloading frametables is implemented, and that the table is not doubled anymore. Would anyone be willing to review the resulting patch?

asmrun/roots.c Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 26, 2015

Merged in trunk, thanks for the patch!

@gasche gasche closed this Jul 26, 2015
@alainfrisch alainfrisch reopened this Nov 3, 2015
@alainfrisch
Copy link
Copy Markdown
Contributor

One of our customers has experienced segfaults when dynlinking a lot of code, and my colleague Roven Gabriel has tracked this down to this PR as the probable cause for the segfault.

Looking at the commits, two things seem fishy to me:

  • The global frametables variable is never updated (so it is always equal to NULL). I guess it should be set to new_frametables at the end of init_frame_descriptors.
  • In init_frame_descriptors, in the case of resizing, frametables is explicitly reset to NULL, but then tail->next is again updated to the value of frametables (now NULL again). I guess tail->next should not be updated in case of resizing.

I've submitted #279 as an attempt to fix these issues.

@chambart
Copy link
Copy Markdown
Contributor Author

chambart commented Nov 4, 2015

@alainfrisch I will try to take some time reread it seriously and verify your patch soon.

@mshinwell
Copy link
Copy Markdown
Contributor

I confirm this patch is the cause of some segfaults; we've seen this on Coq with flambda at high inlining levels.

mshinwell added a commit to mshinwell/ocaml that referenced this pull request Jun 11, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 23, 2020
lthls pushed a commit to lthls/ocaml that referenced this pull request Sep 24, 2020
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Sep 9, 2021
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* Fix leaked file descriptor when querying package documentation

* Enable info logs in production
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants