Skip to content

Cranelift: Get the tail calling convention as fast as fast on Sightglass #6759

@fitzgen

Description

@fitzgen

That is, regular calls with the tail calling convention should be as fast as regular calls with the fast calling convention.

#1065 (comment)

So @jameysharp and I did a little profiling/investigation of switching the internal Wasm calling convention over to tail on our sightglass benchmarks. I was really expecting this to have no measurable change, but unfortunately it looks like it has a ~7% overhead on bz2 and spidermonkey.wasm and ~1% overhead on pulldown-cmark. This is surprising! We think this means that we ~frequently call functions that don't have enough register pressure to clobber all callee-save registers, and since tail only has caller-save registers and zero callee-save registers, we are doing more spills than we used to. Enough more that it is really measurable.

Here are the histograms of number of clobbered callee-save registers in a function for some of our benchmarks:

Details

pulldown-cmark

# Number of samples = 757
# Min = 0
# Max = 5
#
# Mean = 1.9682959048877162
# Standard deviation = 2.4428038716280174
# Variance = 5.967290755240832
#
# Each ∎ is a count of 9
#
 0 ..  1 [ 459 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [   0 ]: 
 2 ..  3 [   0 ]: 
 3 ..  4 [   0 ]: 
 4 ..  5 [   0 ]: 
 5 ..  6 [ 298 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [   0 ]: 
 7 ..  8 [   0 ]: 
 8 ..  9 [   0 ]: 
 9 .. 10 [   0 ]: 

spidermonkey

# Number of samples = 18279
# Min = 0
# Max = 5
#
# Mean = 1.8119153126538674
# Standard deviation = 2.4034432514706436
# Variance = 5.77653946303978
#
# Each ∎ is a count of 233
#
 0 ..  1 [ 11655 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [     0 ]: 
 2 ..  3 [     0 ]: 
 3 ..  4 [     0 ]: 
 4 ..  5 [     0 ]: 
 5 ..  6 [  6624 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [     0 ]: 
 7 ..  8 [     0 ]: 
 8 ..  9 [     0 ]: 
 9 .. 10 [     0 ]: 

bz2

# Number of samples = 127
# Min = 0
# Max = 5
#
# Mean = 0.5511811023622047
# Standard deviation = 1.5659198268780583
# Variance = 2.452104904209808
#
# Each ∎ is a count of 2
#
 0 ..  1 [ 113 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [   0 ]: 
 2 ..  3 [   0 ]: 
 3 ..  4 [   0 ]: 
 4 ..  5 [   0 ]: 
 5 ..  6 [  14 ]: ∎∎∎∎∎∎∎
 6 ..  7 [   0 ]: 
 7 ..  8 [   0 ]: 
 8 ..  9 [   0 ]: 
 9 .. 10 [   0 ]: 

I think we just need to support callee-save registers in the tail calling convention. For simplicity, we can probably just match sys-v / the default native calling convention. A little unfortunate, as it means that chains of tail calls will be saving and restoring callee-save registers that the next function isn't going to use (won't be used again till the chain completes) but we definitely can't pessimize regular calls for the sake of tail call chains.

Metadata

Metadata

Assignees

No one assigned

    Labels

    craneliftIssues related to the Cranelift code generatorcranelift:goal:optimize-speedFocus area: the speed of the code produced by Cranelift.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions