-
Notifications
You must be signed in to change notification settings - Fork 790
wasm2c: Segue support for CPUs without wrgsbase instructions #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sbc100 Sorry to bug, just wanted to bump this up in the inbox. (Just a heads up, that there is one more perf optimization PR following this as well, after which, Segue should be ready for use in production for POSIXy systems) |
src/template/wasm2c.declarations.c
Outdated
| return (void*)__builtin_ia32_rdgsbase64(); | ||
| } else { | ||
| void* base; | ||
| if (syscall(SYS_arch_prctl, ARCH_GET_GS, &base) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this syscall portable, or linux-specific?
Does the cost of this syscall still make the optimization worth while?
Would be instead simplify the code and just disable segue on these older CPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"linux-specific" --- ah good point. I think it maybe linux specific, I'll add some more checks/cases for this.
Does the cost of this syscall still make the optimization worth while? Would be instead simplify the code and just disable segue on these older CPUs?
Yup, this a good set of questions. I think there are a few important points that made me add this.
-
If a host that embeds a wasm2c compiled module can only ship one binary, the host have to make a decision during compilation whether they want to use Segue or not. Since the vast majority of x86 CPUs in use support the fast userspace instruction
wrgsbase, this is something that we definitely want to enable. However, a wasm2c compiled binary that uses Segue must also run on older machines that don't have the wrgsbase instructions. So a fallback for setting the segment register is important here. (For scenarios where the host knows that it is primarily targeting a lot of older machines, disabling segue may be a simpler choice) -
I am following this PR with a separate optional optimization that sets the segment register only if it was changed (which will be enabled in the Firefox use). This is optional as the host needs to confirm that it doesn't use the segment register for anything else. This should reduce the amount of calls to
wrgsbase/SYS_arch_prctl-ARCH_SET_GSwhich would make this optimization definitely worth it in general. -
But even without optimization in point 2, Segue is still useful depending on how often the host program invokes the wasm module. When invoking long running functions in Wasm modules, this optimization still improves net performance by a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure "Segue must also run on older machines" is true is it? Presumably one might want to build a binary that doesn't run on older machines and instead just aborts? Maybe such a micro-optimization is not useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbc100 At the moment, we don't really have a use case where we'd want Segue to abort if there is no userspace wrgsbase instruction. However, I can add a macro configuration to the wasm2c code for this if you think this is useful for certain use cases.
The use case I have in mind is that Firefox essentially ships a single binary for an OS --- it has a single binary independent of whether we are targetting pre-ivybridge CPUs or the latest CPUs. Additionally since extremely critical functionality like XML parsing in Firefox is compiled with wasm2c, we really need these features to function corretly on all x86 CPUs; an abort due to lack of wrgsbase is not a viable option here. An alternate approach of a fat-binary (a binary that includes both a Segue and non-Segue compiled code) would bloat the size of the browser significantly so is also not viable. So, given these constraints, the only path forward here is to potentially take a small hit in perf on older CPUs by setting the base with a syscall to get the perf benefit of Segue on newer CPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbc100 Sorry for the delay in submitting a fixed PR. I started adding support for other POSIX systems as part of my change and discovered that the patch was becoming too large. So i have made the change to just check for linux in this patch and will add support for other OS in future patches. Please sign-off if this looks good/let me know if I should change anything.
e185097 to
7f7ad7c
Compare
|
@sbc100 Just bumping this in your inbox |
7f7ad7c to
6428077
Compare
For older CPUs (older than Ivybridge on Intel), the segment registers had to be set using system calls rather than userspace instructions. Update the segue implementation to fallback to system calls if userspace instructions are not supported.