Use 8 bytes to store int64 components of database keys #107
Use 8 bytes to store int64 components of database keys #107liamsi merged 3 commits intocosmos:developfrom
Conversation
|
This is an incredible performance win! I'm especially glad to see that the source of the large memory allocations is solved! |
| for _, ikey := range []byte{0x11, 0x32, 0x50, 0x72, 0x99} { | ||
| key := []byte{ikey} | ||
| tree.Set(key, []byte(rand.Str(8))) | ||
| tree.Set(key, []byte(random.Str(8))) |
There was a problem hiding this comment.
I made this rename because rand clashes with standard math/rand which I wanted to use. I could of aliased by I thought it was better to avoid clashing with stdlib (maybe better still to not use a global...? (even better to perhaps do without tendermint/common :o - but I should tread lightly for now :) )).
| value := []byte{} | ||
| for i := 0; i < 1000000; i++ { | ||
| t.Set(i2b(int(cmn.RandInt32())), nil) | ||
| t.Set(i2b(int(cmn.RandInt32())), value) |
There was a problem hiding this comment.
go test -bench=. was panicking on develop without this change
ca4bc10 to
dc76584
Compare
|
@ValarDragon negative gives should be fine - I've added a test. |
28d5706 to
d125383
Compare
ValarDragon
left a comment
There was a problem hiding this comment.
This change looks great to me! Using this KeyFormat makes the code much cleaner / more readable.
However I'm unfamiliar with alot of the internals of iavl, so I'll leave it to @jlandrews, @liamsi to approve.
a2a8ab4 to
d265b7d
Compare
|
Just want to note, this performance increase is actually super helpful for our state machine as well. Some benchmarks I was doing had the sprintf calls from GetNode and SaveAs as taking up 1.5% of the entire state machines time. (Includes goleveldb compaction) |
liamsi
left a comment
There was a problem hiding this comment.
Awesome pull request. Thanks a lot @silasdavis 👍
Left a few nits/questions in the review. Will merge tonight.
key_format.go
Outdated
| } | ||
| } | ||
|
|
||
| // Format the byte segments into the key format - will panic if the segment lengths to do match the layout. |
There was a problem hiding this comment.
will panic if the segment lengths to not match the layout.
| return string([]byte{kf.prefix}) | ||
| } | ||
|
|
||
| func scan(a interface{}, value []byte) { |
There was a problem hiding this comment.
I would rename this to scanValue. Probably just a matter of preferences.
key_format.go
Outdated
| return key[:n] | ||
| } | ||
|
|
||
| // Format the args passed into the key format - will panic if the arguments passed to not match the length |
There was a problem hiding this comment.
will panic if the arguments passed do not match the length
key_format.go
Outdated
| binary.BigEndian.PutUint64(bs, uint64(v)) | ||
| return bs | ||
| case int: | ||
| return format(int64(v)) |
There was a problem hiding this comment.
Couldn't all these non-byte cases be simplified to:
case uint64, uint, int64, int:
bs := make([]byte, 8)
binary.BigEndian.PutUint64(bs, uint64(v))
return bs?
There was a problem hiding this comment.
When I read that I thought, oh really, how did I not know you could do that, but sadly you cannot. The compile-time type of v becomes interface{} as soon as you combine types which cannot be type converted.
You must have been confusing this with a language that has type system :)
I can make it a bit more consistent though:
func format(a interface{}) []byte {
switch v := a.(type) {
case uint64:
bs := make([]byte, 8)
binary.BigEndian.PutUint64(bs, v)
return bs
case uint:
return format(uint64(v))
case int64:
return format(uint64(v))
case int:
return format(uint64(v))
case []byte:
return v
default:
panic(fmt.Errorf("KeyFormat format() does not support formatting value of type %T: %v", a, a))
}
}There was a problem hiding this comment.
I'd prefer not to do recursive calls, but instead have a second "formatuint64" function which all of these sub-cases call. A second interface conversions seems like an unnecessary performance hit, given that this may be called a ton.
key_format.go
Outdated
| } | ||
|
|
||
| // Create a []byte key format based on a single byte prefix and fixed width key segments each of whose length is | ||
| // specified by by the corresponding element of layout |
There was a problem hiding this comment.
Maybe, a little example would be great here?
| value := []byte{} | ||
| for i := 0; i < 1000000; i++ { | ||
| t.Set(i2b(int(cmn.RandInt32())), nil) | ||
| t.Set(i2b(int(cmn.RandInt32())), value) |
key_format.go
Outdated
|
|
||
| // Format the args passed into the key format - will panic if the arguments passed to not match the length | ||
| // of the segment to which they correspond. When called with no arguments returns the raw prefix (useful as a start | ||
| // element of the entire keys space when sorted lexicographically) |
There was a problem hiding this comment.
Nit: comments should end with a full stop (here and everywhere else). Don't bother, I'll add these.
| case *int64: | ||
| *v = int64(binary.BigEndian.Uint64(value)) | ||
| case *uint64: | ||
| *v = binary.BigEndian.Uint64(value) |
There was a problem hiding this comment.
Question: Maybe I'm overlooking sth obvious here but why are there only two cases (int64 and uint64) when reading in the value? When formatting the we allow for uint64, uint, int64, int.
There was a problem hiding this comment.
It's convenient to be able to call kf.Key(0, 1, 2) and not have to cast each argument, kf.Key(0) is the example that exists, and in tests.
| func scan(a interface{}, value []byte) { | ||
| switch v := a.(type) { | ||
| case *int64: | ||
| *v = int64(binary.BigEndian.Uint64(value)) |
There was a problem hiding this comment.
Will this still behave as intended when this overflows?
There was a problem hiding this comment.
Yeah it's fine the MSB just becomes the sign bit - I've added a test to demonstrate. Provided the user is consistent with using int64 for signed numbers.
|
@liamsi I'll make these changes shortly - thanks. |
Signed-off-by: Silas Davis <silas@monax.io>
…ids string manipulatio/scanning
100ed1e to
8290d65
Compare
|
@liamsi changes done |
|
Awesome! Merging and will prep a release this week. |
See full changelog here: https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018 * Update to CircleCI 2.0 (#108) * Use 8 bytes to store int64 components of database keys (#107) * Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function * Remove architecture dependent getter functions (#96) * Remove architecture dependent getter functions * update changelog * Prep Release 0.11.0 (#111) * Bump version and update change-log
* 'develop' of github.com:danil-lashin/iavl: Prep Release 0.11.0 (cosmos#111) Remove architecture dependent getter functions (cosmos#96) Use 8 bytes to store int64 components of database keys (cosmos#107) Update to CircleCI 2.0 (cosmos#108) Release 0.10.0: Update Changelog and bump version (cosmos#99) delete empty file (relict from merging master into develop) Mutable/Immutable refactor and GetImmutable snapshots (cosmos#92) Remove unreachable code Remove unused variable dep: Change tendermint dep to be ^v0.22.0 (cosmos#91) Fix benchmark scripts for current code (cosmos#89) release v0.9.2 (cosmos#82) Various lints (cosmos#80) Jae/rangeprooffix (cosmos#75)
* Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function
See full changelog here: https://github.com/tendermint/iavl/blob/develop/CHANGELOG.md#0110-september-7-2018 * Update to CircleCI 2.0 (cosmos#108) * Use 8 bytes to store int64 components of database keys (cosmos#107) * Introduce KeyFormat that uses a full 8 bytes for int64 values and avoids string manipulatio/scanning * Add node key and orphan key benchmark * Fix various issue from PR: punctuation, add overflow test, and improve scan function * Remove architecture dependent getter functions (cosmos#96) * Remove architecture dependent getter functions * update changelog * Prep Release 0.11.0 (cosmos#111) * Bump version and update change-log
This fixes #104 by introducing
KeyFormatthat uses a fixed-width key format over the full 8 bytes provided by int64 versions in orphan, and root keys. It also relies on the fixed width format to provide more efficient code for formatting into and scanning values from keys.As a result of this it seems to lead to a ~10% speedup in
BenchmarkTreeLoadAndDeleteby avoiding the relatively expensivefmt.Sprintfandfmt.Sscanffunctions, see below.Before
KeyFormatintroductiongo test -bench=. goos: linux goarch: amd64 pkg: github.com/tendermint/iavl BenchmarkNodeKey-8 5000000 316 ns/op BenchmarkOrphanKey-8 2000000 613 ns/op ok, starting BenchmarkImmutableAvlTreeMemDB-8 200 5783362 ns/op 1104190 B/op 22488 allocs/op BenchmarkTreeLoadAndDelete/LoadAndDelete-8 2 707800784 ns/op PASS ok github.com/tendermint/iavl 129.478sAfter
KeyFormatintroductiongo test -bench=. goos: linux goarch: amd64 pkg: github.com/tendermint/iavl BenchmarkNodeKey-8 30000000 50.9 ns/op BenchmarkOrphanKey-8 5000000 256 ns/op BenchmarkImmutableAvlTreeMemDB-8 500 4048465 ns/op 82591 B/op 1546 allocs/op BenchmarkTreeLoadAndDelete/LoadAndDelete-8 5 641415439 ns/op PASS ok github.com/tendermint/iavl 121.982sFor
BenchmarkTreeLoadAndDelete/LoadAndDelete-8we have 1 - 641415439/707800784 ~= 9.3%