Skip to content

Add a word for gc metadata in object headers#2790

Closed
osa1 wants to merge 7 commits intomasterfrom
osa1/large_object_bit
Closed

Add a word for gc metadata in object headers#2790
osa1 wants to merge 7 commits intomasterfrom
osa1/large_object_bit

Conversation

@osa1
Copy link
Copy Markdown
Contributor

@osa1 osa1 commented Sep 17, 2021

This will be a part of #2706. cc @ggreif

Submitted to get feedback.

Currently life-mut and stable-mutable fail. Other tests pass.

I compared Wasms for life-mut using master and this branch. Changes seem fine, so I think we may need to update a few more places in the compiler or RTS.

The idea is to have a byte in object headers for GC metadata. Initially this will be used to distinguish large object from others in #2706.

I will be moving unrelated changes (refactoring etc.) to separate PRs to keep #2706 as small as possible.

@osa1 osa1 requested a review from ggreif September 17, 2021 04:02
@osa1 osa1 force-pushed the osa1/large_object_bit branch from 606cbd0 to 8fd4801 Compare September 20, 2021 06:21
@crusso
Copy link
Copy Markdown
Contributor

crusso commented Sep 20, 2021

Why is the PR still failing?

Comment thread src/codegen/compile.ml
@@ -4488,7 +4491,8 @@ module Serialization = struct
(* Second time we see this *)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct thing to do is to just remove the preceding Sanity checks as they will now have false positives.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm confused, but I think those assertions are right. In the size calculation we replace mutable objects with StableSeen, so in this pass we shouldn't see mutable objects. All mutable objects should be turned into StableSeens, or their headers should be updated with their offsets in the serialization buffer.

@nomeata
Copy link
Copy Markdown
Contributor

nomeata commented Sep 20, 2021

Annoying that nix-build-uncached does not print the full log files. Reported Mic92/nix-build-uncached#42

Comment thread src/codegen/compile.ml
Comment thread src/codegen/compile.ml
(* Calculate relative offset *)
let (set_offset, get_offset) = new_local env "offset" in
get_tag ^^ get_data_buf ^^ G.i (Binary (Wasm.Values.I32 I32Op.Sub)) ^^
get_x ^^ Heap.load_field Tagged.tag_field ^^
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the initial load aboce get_x ^^ Tagged.load ^^ set_tag also load the full header and read get_x ^^ Heap.load_field Tagged.tag_field?

Note that the offsets need no be even or aligned in any way, AFAIK, since these are just offsets into the blob, not aligned heap addresses, right? Serialized objects are in no way aligned on word boundaries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the offsets need no be even or aligned in any way, AFAIK, since these are just offsets into the blob, not aligned heap addresses, right? Serialized objects are in no way aligned on word boundaries.

Right, Joachim confirmed this in #2793 (comment)

osa1 added a commit that referenced this pull request Sep 22, 2021
This is in preparation for #2790 and #2706. With #2790 we will start
using rest of the headers for GC metadata (set some of the high bits)
which will break compacting GC as we won't be able to distinguish a
heap address from an object header by checking if the value is larger
than the max. tag value. This check assumes a heap address cannot be
smaller than the max. tag value, which holds because we have at least 64
KiB Rust stack, and then static data for the canister.

With the high bits of headers set, it's possible that some of the
headers will have a value larger than 64 * 1024. So the current check no
longer works.

To allow distinguishing heap locations from headers, this PR refactors
objects tags so that they will all have the least significant bit set.
Since objects and fields are all word aligned (so have the lowest 2 bits
unset, this invariant was established in #2764), we can now check the lowest
bit and distinguish an address from a header.
@osa1
Copy link
Copy Markdown
Contributor Author

osa1 commented Sep 22, 2021

Why is the PR still failing?

There were compile errors in tests, which I fixed, but compacting gc is still not quite right. Tests in run-drun fail with compacting gc. I'm currently debugging.

@osa1 osa1 force-pushed the osa1/large_object_bit branch 2 times, most recently from 5209214 to 5b932d2 Compare September 22, 2021 12:36
Co-authored-by: Gabor Greif <gabor@dfinity.org>
Co-authored-by: Claudio Russo <claudio@dfinity.org>
@osa1 osa1 force-pushed the osa1/large_object_bit branch from 5b932d2 to 05cc444 Compare September 22, 2021 14:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 22, 2021

Comparing from 0314e24 to 7a177e0:
In terms of gas, 1 tests regressed and the mean change is +0.0%.
In terms of size, 3 tests regressed and the mean change is +0.0%.

@osa1
Copy link
Copy Markdown
Contributor Author

osa1 commented Sep 23, 2021

This StableSeen business is really tricky, but I think I figured it out. We can't make StableSeen a one-byte tag, it needs to be 32-bit -1 as before. Otherwise we won't be able to distinguish it from offsets, because small values can be valid offsets. 0xFFFFFFFF (-1) cannot be an offset because we can't write an object to that location, for two reasons: (1) that's the end of 4 GiB address space, but we can't allocate a 4 GiB serialization buffer as that's the entire heap and we always have others stuff in the heap (at the very least, the RTS stack). (2) we can't have an object at that address because objects are at least 2 words large.

Keeping StableSeen as 32-bit is fine as the objects overwritten with StableSeen (and later with offsets) won't be traced by the GC, so losing the other fields in the headers is fine. We already rely on this property, nothing new to this PR.

@osa1 osa1 closed this Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants