Skip to content

Wasmtime: remove indirect-call caching.#8881

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
cfallin:remove-indirect-call-cache
Jun 27, 2024
Merged

Wasmtime: remove indirect-call caching.#8881
cfallin merged 1 commit intobytecodealliance:mainfrom
cfallin:remove-indirect-call-cache

Conversation

@cfallin
Copy link
Copy Markdown
Member

@cfallin cfallin commented Jun 27, 2024

In the original development of this feature, guided by JS AOT compilation to Wasm of a microbenchmark heavily focused on IC sites, I was seeing a ~20% speedup. However, in more recent measurements, on full programs (e.g., the Octane benchmark suite), the benefit is more like 5%.

Moreover, in #8870, I attempted to switch over to a direct-mapped cache, to address a current shortcoming of the design, namely that it has a hard-capped number of callsites it can apply to (50k) to limit impact on VMContext struct size. With all of the needed checks for correctness, though, that change results in a 2.5% slowdown relative to no caching at all, so it was dropped.

In the process of thinking through that, I discovered the current design on main incorrectly handles null funcrefs (EDIT: thanks @alexcrichton for finding the actual bug): it invokes a null code pointer, rather than loading a field from a null struct pointer. The latter was specifically designed to cause the necessary Wasm trap in #8159, but I had missed that the call to a null code pointer would not have the same effect. As a result, we actually can crash the VM (safely at least, but still no good vs. a proper Wasm trap!) with the feature enabled. (It's off by default still.) That could be fixed too, but at this point with the small benefit on real programs, together with the limitation on module size for full benefit, I think I'd rather opt for simplicity and remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert because the original PR refactored the call-indirect generation into smaller helpers and IMHO it's a bit nicer to keep that. But otherwise all traces of the setting, code pre-scan during compilation and special conditions tracked on tables, and codegen changes are gone.

@cfallin cfallin requested a review from a team as a code owner June 27, 2024 15:51
@cfallin cfallin requested review from elliottt and removed request for a team June 27, 2024 15:51
@cfallin cfallin force-pushed the remove-indirect-call-cache branch from f1125ab to 92930d5 Compare June 27, 2024 15:58
@elliottt
Copy link
Copy Markdown
Member

Looks like the fuzzer is still setting the flag, but otherwise this looks good!

In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.

Moreover, in bytecodealliance#8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.

In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in bytecodealliance#8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
@cfallin cfallin force-pushed the remove-indirect-call-cache branch from 92930d5 to 601a96d Compare June 27, 2024 17:12
@cfallin cfallin requested a review from a team as a code owner June 27, 2024 17:12
@cfallin cfallin requested review from fitzgen and removed request for a team June 27, 2024 17:12
@cfallin cfallin enabled auto-merge June 27, 2024 17:12
@cfallin cfallin removed the request for review from fitzgen June 27, 2024 17:13
@cfallin cfallin added this pull request to the merge queue Jun 27, 2024
Merged via the queue into bytecodealliance:main with commit 7bf2b8f Jun 27, 2024
@cfallin cfallin deleted the remove-indirect-call-cache branch June 27, 2024 17:36
@fitzgen
Copy link
Copy Markdown
Member

fitzgen commented Jun 27, 2024

(Whoops, I guess I'm late to the party!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants