Skip to content

Nil pointer dereference in the mempool reactor #5408

@kostko

Description

@kostko

While running our usual batch of E2E tests on the just released v0.34-rc4, it seems that the newly introduced tx batching (commit ffe2742 part of #5321) seems to introduce a possible nil pointer dereference which causes the node to crash.

It seems to occurr in the txs function on line 275:

for {
memTx := next.Value.(*mempoolTx)
if _, ok := memTx.senders.Load(peerID); !ok {
// If current batch + this tx size is greater than max => return.
batchMsg := protomem.Message{
Sum: &protomem.Message_Txs{
Txs: &protomem.Txs{Txs: append(batch, memTx.tx)},
},
}
if batchMsg.Size() > memR.config.MaxBatchBytes {
return batch
}
batch = append(batch, memTx.tx)
}
if next.Next() == nil {
return batch
}
next = next.Next()
}

Did not investigate this much yet but on first glance it could be a concurrency issue -- at the end of the loop you have this check:

if next.Next() == nil {
return batch
}
next = next.Next()

But each call to Next() seems to take a read lock:

// Nonblocking, may return nil if at the end.
func (e *CElement) Next() *CElement {
e.mtx.RLock()
val := e.next
e.mtx.RUnlock()
return val
}

While you do check that Next() is non-nil, you invoke Next() again when assigning to next. And for this second call (I assume that the internal mutex is there because e.next can change concurrently) you don't check whether that is nil. I would suggest that you only call Next() once, check the value for nil and then set next to that value without calling Next() the second time.

Of course the root cause could also be something else as I didn't really investigate this further.

Metadata

Metadata

Assignees

Labels

C:mempoolComponent: MempoolT:bugType Bug (Confirmed)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions