Skip to content

pubsub masks complex deadlock bugs. #951

@jaekwon

Description

@jaekwon

The pubsub channel can fill up and end up blocking the entire program.

This can happen with sufficient activity and a single function somewhere that (in the same goroutine) reads from and also writes to the pubsub.

func main() {
  // ...
  pubsub.Subscribe(ctx, "client", query, ch)
  go myRoutine(ch, pubsub)
}

func myRoutine(ch, pubsub) {
  for {
    item := <- ch // line 1
    pubsub.PublishWithTags(...) // line 2
  }
}

The example above is problematic because between line 1 and line 2, another goroutine somewhere else may have published a million other events, and by the time we run line 2, we may be blocking. It helps that ch's capacity is larger, but it's still not correct because ch may get full depending on goroutine racing.

I believe other pubsub solves this with persistent storage.

I don't know a good way to catch these errors in general, but forcing pubsub's internal channel to have a capacity of 0 will reveal bugs sooner, if the subscriber also has a 0 or small capacity ch to pull from.

So currently we have pubsub with an elastic inner capacity (which I've tried to prove is actually bad), which somewhat (but not completely) compensates by having subscriber ch be flexible. But I think this bad in that it's prone to being used incorrectly resulting in buggy software.

An alternative is to have a synchronous pubsub/event system that only provides the caller with a 0-capacity ch to receive from, or a callback function (I prefer callback functions when they suffice, I'll explain more later here: https://github.com/tendermint/internal/issues/36). AND, we create something on the client connection side (i.e. outside the pubsub module) that does the following:

// NOTE: does not block.
func (c client) mustSendMessage(msg interface{}) {
  select {
    case c.egressCh <- msg:
    case <- c.quitCh: c.close()
    default:  c.closeWithError()
  }
}

Metadata

Metadata

Assignees

Labels

T:bugType Bug (Confirmed)T:securityType: Security (specify priority)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions