Use 128 bit fat pointers for continuation objects by frank-emrich · Pull Request #186 · wasmfx/wasmfxtime

frank-emrich · 2024-05-27T16:19:56Z

This PR changes the representation introduced in #182 , where continuation objects were turned into tagged pointers, containing a pointer to a VMContRef as well as a 16bit sequence counter to perform linearity checks.

In this PR, the representation is changed from 64bit tagged pointers to 128bit fat pointers, where 64bit are used for the pointer and the sequence counter.

Some implementation details:

The design introduced in Continuation objects as fat pointers #182, where we use disassemble_contobj and assemble_contobj to go from effectively Optional<VMContObj> to Optional<VMContRef> is preserved.
The feature unsafe_disable_continuation_linearity_check is preserved: If it is enabled, we do not use fat (or tagged) pointers at all, and all revision checks are disabled.
Overflow checks for the revision counter are no longer necessary and have been removed.
In wasm, we now use the SIMD type I8X16 for any value of type (ref $continuation) and (ref null $continuation). See the comment on vm_contobj_type in shared.rs for why we cannot use I128 or I64X2 instead.
Some translate_* functions in the FuncEnvironment trait now need to take a FunctionBuilder parameter, instead of FuncCursor, which slightly increases the footprint of this PR.
The implementation of table.fill for continuation tables was missing. I've added this and in the process extended cont_table.wast to be generally more exhaustive.
For those libcalls that take a parameter that is a variant type including VMContObj, I've introduced dedicated versions for the VMContObj case, namely table_fill_cont_obj and table_grow_cont_obj in libcalls.rs. These manually split the VMContObj into two parts.

frank-emrich · 2024-05-27T16:41:56Z

Some benchmarking results:
First, I compare the fat pointer implementation against the existing tagged pointer one. Enabling them actually makes all benchmarks except c10m fail, because they overflow the counter. Thus, I've had to slightly tweak their parameters.
In the list below, each line shows the value of X/Y, where X is the runtime of that particular benchmark with tagged pointers, and Y is the runtime with fat pointers. As usual, the difference between, say c10m_wasmfx and c10m_wasmfx_fiber is that the latter uses the fiber interface, while the former uses handwritten wat files.

Suite: c10m
c10m_wasmfx: 1.0125704809561387
c10m_wasmfx_fiber: 0.9931000528537908

Suite: sieve (cut number of primes in half)
sieve_wasmfx: 0.9637743103971731
sieve_wasmfx_fiber: 0.9910300798077857

Suite: skynet (5 instead of 6 levels)
skynet_wasmfx: 0.9970199355953799
skynet_wasmfx_fiber: 0.9912801597259853

Suite: state
only runs when counting up to 8000, at which point runtime is 10ms

I now compare the performance impact of enabling vs disabling the linearity check when using this PR (i.e., whether or not the unsafe_disable_continuation_linearity_check is enabled). Again the values shown are X/Y, where X is the runtime without linearity checks, and Y is the runtime with linearity checks.

Suite: c10m
c10m_wasmfx: 0.9162058249858285
c10m_wasmfx_fiber: 0.9677704802233246
Suite: sieve
sieve_wasmfx: 0.9758646600083649
sieve_wasmfx_fiber: 0.9808578875186281
Suite: skynet
skynet_wasmfx: 0.9675361140008778
skynet_wasmfx_fiber: 0.9859123548277564
Suite: state
state_wasmfx: 0.9729201800828162
state_wasmfx_fiber: 0.983206464991699

dhil

I am confused about the removal of the cont_twice.wast test.

dhil · 2024-05-28T07:18:17Z

-    # Crude check for whether
-    # `unsafe_disable_continuation_linearity_check` makes the test
-    # `cont_twice` fail.
-    - run: |
-        (cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101
-


Why are you deleting this test? It should still fail.

See comment on wast.rs

dhil · 2024-05-28T07:28:30Z

+            // continuation reference and the revision count.
+            // If `unsafe_disable_continuation_linearity_check` is enabled, the revision value is arbitrary.
+            // To denote the continuation being `None`, `init_contref` may be 0.
+            table_grow_cont_obj(vmctx: vmctx, table: i32, delta: i32, init_contref: pointer, init_revision : i64) -> i32;


I guess it would be nice to extend the interface of libcalls API to support 128 bit wide values.

There isn't a particularly nice way of doing that. We would effectively have to do the splitting into two i64 values at the libcall translation layer, and thus the implementation of the libcall in libcalls.rs would still receive two parameters. This only gets uglier when you then incorporate the switching between safe and unsafe mode. Given that we only have two libcalls actually taking these kinds of values, I'd rather avoid all of that.

I am not suggesting doing it now. But I think it will be simpler than you think. We should be able to map a hypothetical i128 to Rust u128 just as i32 maps to u32, etc.

dhil · 2024-05-28T07:32:50Z

+            // This test specifically checks that we catch a continuation being
+            // resumed twice, which we cannot detect in this mode.
+            if test.ends_with("cont_twice.wast") {
+                return true;
+            }


Surely this is only true when unsafe_disable_continuation_linearity_check is toggled?

Yes, that check should read test.ends_with("cont_twice.wast") && cfg!(feature = "unsafe_disable_continuation_linearity_check").

My intention is to make sure that the test suite passes normally with this feature enabled, thus disabling this particular test in its presence. Given that, I had to remove the check regarding cont_twice.wast from main.yml. Or are you particularly interested in ensuring that the test does indeed fail if unsafe_disable_continuation_linearity_check is enabled? That's reasonable (the test case will just not trap with the message expected in that case), but beyond that I would consider the behaviour of that program to be undefined.

I want to make sure the test fails when unsafe_disable_continuation_linearity_check is toggled.

dhil · 2024-05-28T07:38:54Z

-                let overflow =
-                    builder
-                        .ins()
-                        .icmp_imm(IntCC::UnsignedLessThan, revision_plus1, 1 << 16);
-                builder.ins().trapz(overflow, ir::TrapCode::IntegerOverflow); // TODO(dhil): Consider introducing a designated trap code.


I'd like to preserve these traps in debug mode. I think that can prove useful.

And what should they check?

They should check whether the revision counter has wrapped around.

frank-emrich · 2024-05-29T12:27:58Z

I noticed that there is an issue when continuation tables are allocated in a TablePool. I'll update the PR once I have time to fix it.

dhil · 2024-05-29T12:39:59Z

I noticed that there is an issue when continuation tables are allocated in a TablePool. I'll update the PR once I have time to fix it.

What's the problem/error?

frank-emrich · 2024-05-29T13:03:16Z

The TablePool manages a single mmapped memory region from which it allocates all tables. To this end, it calculates the required overall size of this region as max_number_of_allowed_tables * max_allowed_element_count_per_table * size_of_each_table_entry. Thus, the memory for table with index i in the pool then starts at i * max_allowed_element_count_per_table * size_of_each_table_entry.

However, all of this is based on the (hardcoded) assumption that all table entries across all table types are pointer-sized (i.e., size_of_each_table_entry is sizeof(*mut u8)). But as of this PR, this is not the case anymore.

I will address this as follows:

Change the calculation of the overall size of the mmapped region to max_number_of_allowed_tables * max_allowed_element_count_per_table * max_size_of_each_table_entry, where max_size_of_each_table_entry is now sizeof(VMContObj) == 16. This effectively doubles the amount of address space occupied by the table pool. The calculation of the start address of each table is changed accordingly.
Change the logic for allocating and deallocating tables from the pool so that we take the element size for that particular table type into account when committing and decommitting memory.

In summary, these changes mean that while the table pool occupies more virtual address space, the amount of actually committed pages for non-continuation tables does not change.

There are some other solutions, which seem less preferable:

Simply refuse to allocate continuation tables that have more than max_allowed_element_count_per_table / 2 entries. That seems dodgy.
Have two separate mmapped regions, one for tables with pointer-sized entries and one for tables that contain fat pointers. Despite complicating the implementation of TablePool, it has the following drawback, defeating the whole purpose of the separation: The current design of the TablePool assumes that you allocate (but don't commit) all the required memory upfront. But the size of the mmapped region for small tables + the size of the region for large tables would together be larger than the single unified region proposed above.

frank-emrich · 2024-06-11T14:57:51Z

I have implemented this fix now independently #192, meaning that the current PR needs to be landed after #192.

This PR provides a prerequisite for #186, by implementing a solution for a problem originally described [here](#186 (comment)). To reiterate, the problem is as follows: For "static" tables (= tables managed my a `TablePool`), the `TablePool` manages a single mmapped memory region from which it allocates all tables. To this end, it calculates the required overall size of this region as `max_number_of_allowed_tables` * `max_allowed_element_count_per_table` * `size_of_each_table_entry`. Thus, the memory for table with index i in the pool then starts at i * `max_allowed_element_count_per_table` * `size_of_each_table_entry`. However, all of this is based on the (hardcoded) assumption that all table entries across all table types are pointer-sized (i.e., `size_of_each_table_entry` is `sizeof(*mut u8))`. But once #186 lands, this is not the case any more. This PR addresses this as follows: 1. Change the calculation of the overall size of the mmapped region to `max_number_of_allowed_tables` * `max_allowed_element_count_per_table` * `max_size_of_each_table_entry`, where `max_size_of_each_table_entry` will be `sizeof(VMContObj)` == 16 once #186 lands. This effectively doubles the amount of address space occupied by the table pool. The calculation of the start address of each table is changed accordingly. 2. Change the logic for allocating and deallocating tables from the pool so that we take the element size for that particular table type into account when committing and decommitting memory. Note that the logic implemented this PR is independent from the underlying element sizes. This means that this PR itself does not change the space occupied by the tables, as `max_size_of_each_table_entry` is currently still the size of a pointer. The necessary changes happen implicitly once #186 lands, which changes the size of `ContTableElem` which in turns changes the constant `MAX_TABLE_ELEM_SIZE`. In summary, these changes mean that in the future the table pool occupies more virtual address space, but the amount of actually committed pages for non-continuation tables does not change.

frank-emrich · 2024-06-12T12:29:32Z

This should be good to go now

dhil

LGTM.

dhil · 2024-06-12T12:47:00Z

    # `cont_twice` fail.
    - run: |
-        (cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101
+        (cargo run --features=unsafe_disable_continuation_linearity_check -- wast -W=exceptions,function-references,typed-continuations tests/misc_testsuite/typed-continuations/cont_twice.wast && test $? -eq 101) || test $? -eq 101


Why cargo run and not cargo test? I thought the cargo test artifact would already have been built (maybe the run artifact hasn't/has too?).

This particular test is now #[ignore]-d in the presence of unsafe_disable_continuation_linearity_check. Thus, you need to manually feed it into wasmtime wast to run it.

Edit: Sorry, it's not actually ignored using #[ignore], but manually skipped by the logic in tests/wast.rs. But the result is the same.

Alternatively, the following works:

cargo test --features=unsafe_disable_continuation_linearity_check --test wast -- --include-ignored --exact Cranelift/tests/misc_testsuite/typed-continuations/cont_twice.wast

In terms of avoiding additional building, it shouldn't make a difference anyway. As far as I can tell, this is the only place where we actually build with unsafe_disable_continuation_linearity_check, meaning that it will cause a separate rebuild of most stuff anyway.

OK, I am fine with either. I was just curious. Thanks!

frank-emrich requested a review from dhil May 27, 2024 16:43

dhil reviewed May 28, 2024

View reviewed changes

Comment thread tests/all/typed_continuations.rs

dhil reviewed May 28, 2024

View reviewed changes

Comment thread crates/cranelift/src/func_environ.rs Outdated

frank-emrich mentioned this pull request Jun 11, 2024

Prepare static tables for non-pointer-sized entries #192

Merged

frank-emrich added 18 commits June 12, 2024 13:28

implement 128bit fat pointers for continuation objects

bbd2bf1

disable cont_twice.wast test if linearity checks disabled

7d2c430

extend cont_table.wast test

93a0c3a

small cleanup

f155350

remove commented out assertion about lazy init

2e7d247

cleanup

0010023

comment cleanup

041bd15

fix warnings

841e6ef

more cleanup/warnings

0ffe8ce

cargo fmt --all

f856427

move #[test] on function

b746476

fix null_pointer_optimization test

1252335

separately run cont_twice.wast in unsafe mode

b026d20

fix logic for ignoring cont_twice.wast in unsafe mode

989fb80

restore continuation_revision_counter_wraparound test, but ignored

2f629b1

address minor review comments

2699c4c

restore whitespace in continuation_revision_counter_wraparound test

587cf50

make VMContObj 16 byte aligned

fc71dbc

frank-emrich force-pushed the 128-fatpointers branch from 56cc3c8 to fc71dbc Compare June 12, 2024 12:28

frank-emrich requested a review from dhil June 12, 2024 12:29

dhil approved these changes Jun 12, 2024

View reviewed changes

frank-emrich merged commit 78b813d into wasmfx:main Jun 12, 2024

frank-emrich deleted the 128-fatpointers branch June 13, 2024 10:47

Conversation

frank-emrich commented May 27, 2024

Uh oh!

frank-emrich commented May 27, 2024

Uh oh!

dhil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

frank-emrich commented May 29, 2024

Uh oh!

dhil commented May 29, 2024

Uh oh!

frank-emrich commented May 29, 2024

Uh oh!

frank-emrich commented Jun 11, 2024

Uh oh!

frank-emrich commented Jun 12, 2024

Uh oh!

dhil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frank-emrich Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frank-emrich Jun 12, 2024 •

edited

Loading