Wasmtime: remove indirect-call caching. by cfallin · Pull Request #8881 · bytecodealliance/wasmtime

cfallin · 2024-06-27T15:51:24Z

In the original development of this feature, guided by JS AOT compilation to Wasm of a microbenchmark heavily focused on IC sites, I was seeing a ~20% speedup. However, in more recent measurements, on full programs (e.g., the Octane benchmark suite), the benefit is more like 5%.

Moreover, in #8870, I attempted to switch over to a direct-mapped cache, to address a current shortcoming of the design, namely that it has a hard-capped number of callsites it can apply to (50k) to limit impact on VMContext struct size. With all of the needed checks for correctness, though, that change results in a 2.5% slowdown relative to no caching at all, so it was dropped.

In the process of thinking through that, I discovered the current design on main incorrectly handles null funcrefs (EDIT: thanks @alexcrichton for finding the actual bug): it invokes a null code pointer, rather than loading a field from a null struct pointer. The latter was specifically designed to cause the necessary Wasm trap in #8159, but I had missed that the call to a null code pointer would not have the same effect. As a result, we actually can crash the VM (safely at least, but still no good vs. a proper Wasm trap!) with the feature enabled. (It's off by default still.) That could be fixed too, but at this point with the small benefit on real programs, together with the limitation on module size for full benefit, I think I'd rather opt for simplicity and remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert because the original PR refactored the call-indirect generation into smaller helpers and IMHO it's a bit nicer to keep that. But otherwise all traces of the setting, code pre-scan during compilation and special conditions tracked on tables, and codegen changes are gone.

elliottt · 2024-06-27T17:11:01Z

Looks like the fuzzer is still setting the flag, but otherwise this looks good!

In the original development of this feature, guided by JS AOT compilation to Wasm of a microbenchmark heavily focused on IC sites, I was seeing a ~20% speedup. However, in more recent measurements, on full programs (e.g., the Octane benchmark suite), the benefit is more like 5%. Moreover, in bytecodealliance#8870, I attempted to switch over to a direct-mapped cache, to address a current shortcoming of the design, namely that it has a hard-capped number of callsites it can apply to (50k) to limit impact on VMContext struct size. With all of the needed checks for correctness, though, that change results in a 2.5% slowdown relative to no caching at all, so it was dropped. In the process of thinking through that, I discovered the current design on `main` incorrectly handles null funcrefs: it invokes a null code pointer, rather than loading a field from a null struct pointer. The latter was specifically designed to cause the necessary Wasm trap in bytecodealliance#8159, but I had missed that the call to a null code pointer would not have the same effect. As a result, we actually can crash the VM (safely at least, but still no good vs. a proper Wasm trap!) with the feature enabled. (It's off by default still.) That could be fixed too, but at this point with the small benefit on real programs, together with the limitation on module size for full benefit, I think I'd rather opt for simplicity and remove the cache entirely. Thus, this PR removes call-indirect caching. It's not a direct revert because the original PR refactored the call-indirect generation into smaller helpers and IMHO it's a bit nicer to keep that. But otherwise all traces of the setting, code pre-scan during compilation and special conditions tracked on tables, and codegen changes are gone.

fitzgen · 2024-06-27T18:14:17Z

(Whoops, I guess I'm late to the party!)

cfallin requested a review from a team as a code owner June 27, 2024 15:51

cfallin requested review from elliottt and removed request for a team June 27, 2024 15:51

cfallin force-pushed the remove-indirect-call-cache branch from f1125ab to 92930d5 Compare June 27, 2024 15:58

elliottt approved these changes Jun 27, 2024

View reviewed changes

cfallin force-pushed the remove-indirect-call-cache branch from 92930d5 to 601a96d Compare June 27, 2024 17:12

cfallin requested a review from a team as a code owner June 27, 2024 17:12

cfallin requested review from fitzgen and removed request for a team June 27, 2024 17:12

cfallin enabled auto-merge June 27, 2024 17:12

cfallin removed the request for review from fitzgen June 27, 2024 17:13

cfallin added this pull request to the merge queue Jun 27, 2024

Merged via the queue into bytecodealliance:main with commit 7bf2b8f Jun 27, 2024

cfallin deleted the remove-indirect-call-cache branch June 27, 2024 17:36

fitzgen approved these changes Jun 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasmtime: remove indirect-call caching.#8881

Wasmtime: remove indirect-call caching.#8881
cfallin merged 1 commit intobytecodealliance:mainfrom
cfallin:remove-indirect-call-cache

cfallin commented Jun 27, 2024 •

edited

Loading

Uh oh!

elliottt commented Jun 27, 2024

Uh oh!

fitzgen commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cfallin commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elliottt commented Jun 27, 2024

Uh oh!

fitzgen commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cfallin commented Jun 27, 2024 •

edited

Loading