Skip to content

Conversation

@shravanrn
Copy link
Collaborator

For older CPUs (older than Ivybridge on Intel), the segment registers had to be set using system calls rather than userspace instructions. Update the segue implementation to fallback to system calls if userspace instructions are not supported.

@shravanrn shravanrn requested review from keithw and sbc100 July 15, 2024 22:10
@shravanrn
Copy link
Collaborator Author

@sbc100 Sorry to bug, just wanted to bump this up in the inbox. (Just a heads up, that there is one more perf optimization PR following this as well, after which, Segue should be ready for use in production for POSIXy systems)

@shravanrn
Copy link
Collaborator Author

@sbc100 @keithw Just bumping this in your inbox

return (void*)__builtin_ia32_rdgsbase64();
} else {
void* base;
if (syscall(SYS_arch_prctl, ARCH_GET_GS, &base) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this syscall portable, or linux-specific?

Does the cost of this syscall still make the optimization worth while?

Would be instead simplify the code and just disable segue on these older CPUs?

Copy link
Collaborator Author

@shravanrn shravanrn Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"linux-specific" --- ah good point. I think it maybe linux specific, I'll add some more checks/cases for this.

Does the cost of this syscall still make the optimization worth while? Would be instead simplify the code and just disable segue on these older CPUs?

Yup, this a good set of questions. I think there are a few important points that made me add this.

  1. If a host that embeds a wasm2c compiled module can only ship one binary, the host have to make a decision during compilation whether they want to use Segue or not. Since the vast majority of x86 CPUs in use support the fast userspace instruction wrgsbase, this is something that we definitely want to enable. However, a wasm2c compiled binary that uses Segue must also run on older machines that don't have the wrgsbase instructions. So a fallback for setting the segment register is important here. (For scenarios where the host knows that it is primarily targeting a lot of older machines, disabling segue may be a simpler choice)

  2. I am following this PR with a separate optional optimization that sets the segment register only if it was changed (which will be enabled in the Firefox use). This is optional as the host needs to confirm that it doesn't use the segment register for anything else. This should reduce the amount of calls to wrgsbase/SYS_arch_prctl-ARCH_SET_GS which would make this optimization definitely worth it in general.

  3. But even without optimization in point 2, Segue is still useful depending on how often the host program invokes the wasm module. When invoking long running functions in Wasm modules, this optimization still improves net performance by a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "Segue must also run on older machines" is true is it? Presumably one might want to build a binary that doesn't run on older machines and instead just aborts? Maybe such a micro-optimization is not useful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 At the moment, we don't really have a use case where we'd want Segue to abort if there is no userspace wrgsbase instruction. However, I can add a macro configuration to the wasm2c code for this if you think this is useful for certain use cases.

The use case I have in mind is that Firefox essentially ships a single binary for an OS --- it has a single binary independent of whether we are targetting pre-ivybridge CPUs or the latest CPUs. Additionally since extremely critical functionality like XML parsing in Firefox is compiled with wasm2c, we really need these features to function corretly on all x86 CPUs; an abort due to lack of wrgsbase is not a viable option here. An alternate approach of a fat-binary (a binary that includes both a Segue and non-Segue compiled code) would bloat the size of the browser significantly so is also not viable. So, given these constraints, the only path forward here is to potentially take a small hit in perf on older CPUs by setting the base with a syscall to get the perf benefit of Segue on newer CPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 Sorry for the delay in submitting a fixed PR. I started adding support for other POSIX systems as part of my change and discovered that the patch was becoming too large. So i have made the change to just check for linux in this patch and will add support for other OS in future patches. Please sign-off if this looks good/let me know if I should change anything.

@shravanrn
Copy link
Collaborator Author

@sbc100 Just bumping this in your inbox

@shravanrn
Copy link
Collaborator Author

@sbc100 @kripken @keithw This has been blocked for a while. Could one of you help with landing this?

@shravanrn shravanrn enabled auto-merge (rebase) September 6, 2024 21:39
@shravanrn shravanrn merged commit c6853eb into WebAssembly:main Sep 6, 2024
@shravanrn shravanrn deleted the segue-syscall branch September 6, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants