Conversation
Codecov Report
@@ Coverage Diff @@
## master #70 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 3 3
Lines 674 937 +263
==========================================
+ Hits 674 937 +263 |
|
I did inlining so none SetBytesN calls other SetBytesN (most of these calls were not inlined). This requires final round of cleanups.
|
|
Nice improvement!
Not sure -- are you saying we should copy-paste the remaining functions from binary.BigEndian too? I'd rather not, if there's no gain in it. And another things, since the "specific" is ~10x faster than `generic", it might make sense to replace the "generic" with a "switch + specific" ? |
|
Oh - Using a switch: vs |
|
Pushed a naive version now, along with a failing testcase as a reminder to fix it :) |
Both variants generate the same assembly (https://godbolt.org/z/wJE3K_): func bigEndianUint40_1(b []byte) uint64 {
_ = b[4] // bounds check hint to compiler; see golang.org/issue/14808
return uint64(b[4]) | uint64(b[3])<<8 | uint64(b[2])<<16 | uint64(b[1])<<24 |
uint64(b[0])<<32
}
func bigEndianUint40_2(b []byte) uint64 {
_ = b[4] // bounds check hint to compiler; see golang.org/issue/14808
return uint64(binary.BigEndian.Uint32(b[1:5])) | uint64(b[0])<<32
}So there are 3 options here:
|
Yes, make sense. I actually thought this is already done, but it was only code in |
I don't really have a strong opinion... All else being equal, I think we should take whichever looks cleanest, so that the code is easy to understand and easy for a reviewer to vet. I personally think |
|
Oh right, another thing. In bytearr := hex2Bytes("AAAA12131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031BBBB")
z.SetBytes(bytearr)
fmt.Printf("%x\n", z) // 1415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031bbbb
z2.SetBytesx(bytearr)
fmt.Printf("%x\n", z2) // aaaa12131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2fThe current master uses the last bytes, whereas my poc used the first bytes. Which makes more sense, and why? |
|
I guess, if we want to align with |
chfast
left a comment
There was a problem hiding this comment.
Looks good, but I'd like to rebase to test on big-endian. I can take care of that.
uint256.go
Outdated
|
|
||
| // SetBytes interprets buf as the bytes of a big-endian unsigned | ||
| // integer, sets z to that value, and returns z. | ||
| /* |
| // Uint64WithOverflow returns the lower 64-bits of z and bool whether overflow occurred | ||
| func (z *Int) Uint64WithOverflow() (uint64, bool) { | ||
| return z[0], z[1] != 0 || z[2] != 0 || z[3] != 0 | ||
| return z[0], (z[1] | z[2] | z[3]) != 0 |
There was a problem hiding this comment.
Either this isn't go-formatted, or the lines at 800 , 805 aren't go-formatted
There was a problem hiding this comment.
I double checked. It looks this is different depending if part of longer boolean expression (&&, ||) or not.
also improve SetBytes-methods via compiler hints name old time/op new time/op delta SetBytes/SetBytes-generic-6 1.34µs ± 1% 1.34µs ± 3% ~ (p=0.465 n=8+9) SetBytes/SetBytes-specific-6 178ns ± 5% 165ns ± 3% -7.39% (p=0.000 n=10+9)
|
@chfast I rebased it, squashed it lightly and removed the junk |
|
Merge! |
This was a little experiment that turned out pretty nicely.
Go-ethereum branch for experimentation: https://github.com/holiman/go-ethereum/tree/fixedbig_optimizations
Using size-specific setbyte methods, things got a lot faster,
1.34µsvs165ns:By generating code for PUSHX instructions, I got the following improvements in geth:
Further (but unrelated to this PR), by also generating code for SWAPX/DUPX, I got the following: