sql: introduce new FIFO cache for txnID cache#76350
sql: introduce new FIFO cache for txnID cache#76350craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
kv95 benchmark:
Microbenchmark: |
972dee5 to
30c10c9
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/contention/txnidcache/fifo_cache.go, line 68 at r2 (raw file):
} c.mu.eviction.insertBlock(block)
I worry about cases where you flush due to a timer as opposed to because the blocks are full and then you end up with mostly empty blocks in the cache. Perhaps it'd be better to have evictionNode have its own embedded messageBlock (as opposed to *messageBlock) and copy out of the block in add and into the current tail, filling up blocks in the FIFO as you go. Then you know you'll only ever have at most one non-full block. Then, sync.Pool the evictionNode.
30c10c9 to
db5eb95
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/sql/contention/txnidcache/fifo_cache.go, line 68 at r2 (raw file):
Previously, ajwerner wrote…
I worry about cases where you flush due to a timer as opposed to because the blocks are full and then you end up with mostly empty blocks in the cache. Perhaps it'd be better to have
evictionNodehave its own embedded messageBlock (as opposed to *messageBlock) and copy out of the block inaddand into the current tail, filling up blocks in the FIFO as you go. Then you know you'll only ever have at most one non-full block. Then,sync.PooltheevictionNode.
I went down the route of just directly hijacking the blocks and use them in the eviction list. (mostly was because I misunderstood your comment when I first read it).
Thoughts on this approach?
db5eb95 to
c5af5a4
Compare
|
Compared to the previous approach (that risked having many empty blocks) I think the new overhead here is about 7% in the microbenchmark. I will follow up with the kv95 result once it's ready. |
c5af5a4 to
fd7287a
Compare
|
Updated kv95 benchmark. Hmm seems like it's a bit slower than the previous simple eviction list implementation.
|
|
Raw files output from both kv95 runs |
| blockSize++ | ||
| } | ||
|
|
||
| if blockHijacked := c.mu.eviction.insertBlock(block, blockSize); !blockHijacked { |
There was a problem hiding this comment.
some of this hijacking feels too clever to me. I feel like just embedding the messageBlock as a value in eviction list and pooling that will be simpler from a data abstraction perspective layer. You can do it as a single for loop over the block. I don't feel that strongly though.
There was a problem hiding this comment.
Done. Switched to using pool for node allocation.
| } | ||
| } | ||
|
|
||
| type evictionNode struct { |
There was a problem hiding this comment.
I think these things need to be pooled. I'd like to see the benchmark allocating on average zero or near zero bytes as opposed to the current ~70. I wouldn't focus so much on the runtime in terms of nanoseconds for this, if you get the allocations to zero, my money is that we won't see big impact on the kv workload.
fd7287a to
273fcbe
Compare
| nodePool: &sync.Pool{ | ||
| New: func() interface{} { | ||
| return &evictionNode{} | ||
| }, | ||
| }, |
There was a problem hiding this comment.
sync.Pool objects should be global vars. There's no good reason to have more than one for a given type. Every instance gets registered in the runtime and updated on every gc cycle.
There was a problem hiding this comment.
Done. Also created #76418 to clean up the rest of the sync.Pool usage.
e3c74ed to
4103821
Compare
This commit introduces the new BenchmarkWriter benchmark to measure the write performance for TxnID Cache. Informs cockroachdb#76013 Release note: None
4103821 to
04610e9
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Just cleanup nits from me. I didn't scrutinize the tests all that closely, but they seemed fine.
Mod the nits, and the naming one really are up to you, 👍 LGTM
| } | ||
|
|
||
| // insertBlock insert the messageBlock into the eviction list. | ||
| func (e *evictionList) insertBlock(block *messageBlock, blockSize int) { |
There was a problem hiding this comment.
I think it might be more elegant if you just passed the block in here as a slice. It's an extra word on the stack, but who cares.
| } | ||
|
|
||
| // Handle when the tail block is full. | ||
| if e.tail.isFull() { |
There was a problem hiding this comment.
what if, instead of this method, you put a method on the evictionNode which was:
func (e *evictionList) append(msgs []ResolveTxnID) {
msgs = e.appendToTail(msgs)
for len(msgs) > 0 {
e.addNode()
msgs = e.appendToTail(msgs)
}
}
func (e *evictionList) addNode() {
newNode := nodePool.Get().(*evictionNode)
if e.head == nil {
e.head = newNode
} else {
e.tail.next = newNode
}
e.tail = newNode
}
func (e *evictionList) appendToTail(msgs []ResolvedTxnID) (remaining []ResolvedTxnID) {
if e.head == nil {
return msgs
}
toCopy := messageBlockSize - e.tailBlockIdx
if toCopy > len(msgs) {
toCopy = len(msgs)
}
copy(e.tail.messageBlock[:], msgs[:toCopy])
return msgs[toCopy:]
}| // fifoCache sometimes directly hijack the input messageBlock to form the | ||
| // eviction list and often even directly overwrite its content to minimize | ||
| // memory allocation. | ||
| func (c *fifoCache) Add(block *messageBlock) { |
There was a problem hiding this comment.
how come this is exported?
| @@ -20,6 +20,10 @@ const messageBlockSize = 1024 | |||
|
|
|||
| type messageBlock [messageBlockSize]ResolvedTxnID | |||
There was a problem hiding this comment.
If I were to be nit-picky, I like I'd just call this block. The fact that things are "messages" sort of muddles things for me. I think I'd go through and eliminate the word message/msg everywhere. messageSink would become blockSink, messageBlockSize->blockSize msgBlockPool->blockPool. Then for the list, maybe blockListNode and blockList?
| @@ -20,6 +20,10 @@ const messageBlockSize = 1024 | |||
|
|
|||
There was a problem hiding this comment.
1024 feels pretty big to me. How do you feel about maybe 170? If each ResolvedTxnID is 24 bytes, that gives you as close as you can get with a 4k page and your nodes would be still under 4k too. I think at 170 you've already amortized a ton.
|
|
||
| // tailBlockIdx is an index pointing into the next empty slot in messageBlock | ||
| // stored in the tail pointer. | ||
| tailBlockIdx int |
There was a problem hiding this comment.
tailBlockIdx feels like one too many words. tailIdx?
| func forkBlock(b *messageBlock) *messageBlock { | ||
| newBlock := &messageBlock{} | ||
| copy(newBlock[:], b[:]) | ||
| return newBlock | ||
| } |
bcae826 to
35d2a40
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/sql/contention/txnidcache/concurrent_write_buffer.go, line 20 at r6 (raw file):
Previously, ajwerner wrote…
1024 feels pretty big to me. How do you feel about maybe 170? If each
ResolvedTxnIDis 24 bytes, that gives you as close as you can get with a 4k page and your nodes would be still under 4k too. I think at 170 you've already amortized a ton.
Set it to 168 so that it makes some math a bit easier without doing rounding.
pkg/sql/contention/txnidcache/concurrent_write_buffer.go, line 21 at r6 (raw file):
Previously, ajwerner wrote…
If I were to be nit-picky, I like I'd just call this
block. The fact that things are "messages" sort of muddles things for me. I think I'd go through and eliminate the wordmessage/msgeverywhere.messageSinkwould becomeblockSink,messageBlockSize->blockSizemsgBlockPool->blockPool. Then for the list, maybeblockListNodeandblockList?
Done.
pkg/sql/contention/txnidcache/fifo_cache.go, line 55 at r6 (raw file):
Previously, ajwerner wrote…
tailBlockIdxfeels like one too many words.tailIdx?
Done.
pkg/sql/contention/txnidcache/fifo_cache.go, line 75 at r6 (raw file):
Previously, ajwerner wrote…
how come this is exported?
Done. Unexported.
pkg/sql/contention/txnidcache/fifo_cache.go, line 139 at r6 (raw file):
Previously, ajwerner wrote…
I think it might be more elegant if you just passed the block in here as a slice. It's an extra word on the stack, but who cares.
Done.
pkg/sql/contention/txnidcache/fifo_cache.go, line 153 at r6 (raw file):
Previously, ajwerner wrote…
what if, instead of this method, you put a method on the
evictionNodewhich was:func (e *evictionList) append(msgs []ResolveTxnID) { msgs = e.appendToTail(msgs) for len(msgs) > 0 { e.addNode() msgs = e.appendToTail(msgs) } } func (e *evictionList) addNode() { newNode := nodePool.Get().(*evictionNode) if e.head == nil { e.head = newNode } else { e.tail.next = newNode } e.tail = newNode } func (e *evictionList) appendToTail(msgs []ResolvedTxnID) (remaining []ResolvedTxnID) { if e.head == nil { return msgs } toCopy := messageBlockSize - e.tailBlockIdx if toCopy > len(msgs) { toCopy = len(msgs) } copy(e.tail.messageBlock[:], msgs[:toCopy]) return msgs[toCopy:] }
Done.
pkg/sql/contention/txnidcache/fifo_cache_test.go, line 211 at r6 (raw file):
Previously, ajwerner wrote…
nit:
cloneBlock?
Done.
Previously, txnID cache relied on cache.UnorderedCache for storage and FIFO eviction behavior. However, due to unintended allocation behavior within cache.UnorderedCache, this yielded about 20% throughput drop in kv95 benchmark. This commit introduces a new FIFO cache, built to remove all allocation during write operation. This commit removes all performance hit caused by the previous use of cache.UnorderedCache. Resolves cockroachdb#76013 Release note: None
35d2a40 to
8c35183
Compare
|
TFTR! bors r+ |
|
Build succeeded: |
Reviewer note: first commit belongs to #76244
Previously, txnID cache relied on cache.UnorderedCache for storage and
FIFO eviction behavior. However, due to unintended allocation behavior
within cache.UnorderedCache, this yielded about 20% throughput drop in
kv95 benchmark.
This commit introduces a new FIFO cache, built to remove all allocation
during write operation. This commit removes all performance hit caused
by the previous use of cache.UnorderedCache.
Resolves #76013
Release note: None