Skip to content

fix random failure in Marshal.from_{string,bytes} (PR#12056)#12064

Merged
xavierleroy merged 1 commit intoocaml:trunkfrom
damiendoligez:fix-pr12056
Mar 4, 2023
Merged

fix random failure in Marshal.from_{string,bytes} (PR#12056)#12064
xavierleroy merged 1 commit intoocaml:trunkfrom
damiendoligez:fix-pr12056

Conversation

@damiendoligez
Copy link
Copy Markdown
Member

@damiendoligez damiendoligez commented Mar 2, 2023

The problem is pretty basic after all: a minor GC occurs while caml_input_val_from_bytes is holding an infix pointer into a string located in the minor heap.

The fix involves moving the call to intern_decompress_input after intern_alloc_storage and removing the comment that says not to do that. The comment was misleading because intern_alloc_storage only allocates in the minor heap while intern_decompress_input frees memory in malloc land, so they can't interact anyway.

Fixes (one part of) #12056

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 2, 2023

The comment was misleading because intern_alloc_storage only allocates in the minor heap while intern_decompress_input frees memory in malloc land, so they can't interact anyway.

I read the comment as not being about memory safety, but rather about concerns on the maximal memory usage of the process (so more like OOM conditions, etc.). If I take a large compressed payload that decompresses into an even larger uncompressed payload, I don't want to have to hold the two alive at the same time in my memory (irrespectively of whether they are GG- or malloc-owned).

My proposal for the fix:

  • change intern_decompress_input to return a boolean that is true if decompression happened
    (if that boolean is true in the current function, then intern_src is not inside a GC-owned block anymore)
  • after intern_alloc_storage, refresh intern_src if no decompression happened

Note: the bug that you are fixing was introduced by the decompression PR. The test was already flaky before that, but it failed much less often. There are probably other bugs still lurking.

@damiendoligez
Copy link
Copy Markdown
Member Author

I read the comment as not being about memory safety, but rather about concerns on the maximal memory usage of the process (so more like OOM conditions, etc.). If I take a large compressed payload that decompresses into an even larger uncompressed payload, I don't want to have to hold the two alive at the same time in my memory (irrespectively of whether they are GG- or malloc-owned).

That's exactly how I read it too, and it's still misleading because intern_alloc_storage doesn't allocate anything when the payload is large.

Note: the bug that you are fixing was introduced by the decompression PR. The test was already flaky before that, but it failed much less often. There are probably other bugs still lurking.

Yes, I forgot to mention that. I tried a bit, but failed to reproduce it.

@xavierleroy
Copy link
Copy Markdown
Contributor

OK, so it was my fault after all, sorry for having introduced this bug in #12006.

For intern_alloc_storage, we're both right / both wrong : it does allocate in the OCaml heap, which I missed completely, but it also allocates the shared object table using caml_stat_alloc. That's why I was tempted to decompress before doing this allocation, so that the block of compressed data allocated (using caml_stat_alloc too) in caml_input_val can be freed and its space reused. There's nothing "misleading" in that, just premature optimization, maybe.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 2, 2023

(We get to blame the flaky test. It's its fault if we didn't trust it when it failed on the PR. Maybe.)

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix, thanks. I still like the comment you removed, but won't fight for it.

Changes Outdated
Comment on lines +617 to +618
- #12056, #12057, #??: Random failure in Marshal.from_{string,bytes}
(Damien Doligez, review by ??)
Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let the Changelog police tell you if you need a separate entry for these or if they should be merged with #12006 .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

police heree: it's as Damien prefers.

Comment on lines 908 to +914
intern_alloc_storage(s, h.whsize, h.num_objects);
s->intern_src = &Byte_u(str, ofs + h.header_len); /* If a GC occurred */
/* Decompress if needed */
intern_decompress_input(s, "input_val_from_string", &h);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is obviously more correct :-) and there's no regrets w.r.t. memory usage, since str is in the OCaml heap (it's not malloc-ed) and live throughout the function, so intern_decompress_input has no chance to free it.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also believe that the change is correct.)

@damiendoligez
Copy link
Copy Markdown
Member Author

For intern_alloc_storage, we're both right / both wrong : it does allocate in the OCaml heap, which I missed completely, but it also allocates the shared object table using caml_stat_alloc. That's why I was tempted to decompress before doing this allocation, so that the block of compressed data allocated (using caml_stat_alloc too) in caml_input_val can be freed and its space reused. There's nothing "misleading" in that, just premature optimization, maybe.

Oh that's right, I missed the shared object table. I'll restore the comment and update Changes, then I'll merge.

@xavierleroy
Copy link
Copy Markdown
Contributor

Looks great, thanks! Now merging...

@xavierleroy xavierleroy merged commit 0492b9a into ocaml:trunk Mar 4, 2023
@damiendoligez damiendoligez deleted the fix-pr12056 branch March 8, 2023 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants