Skip to content

Backlog spikes reported every hour #850

@pjm17971

Description

@pjm17971

We run many workers on GKE that essentially pull messages from a Pubsub subscription at a rate of about 120k/min, do some processing and write the result into Bigtable. As part of a recent change we upgraded this Pubsub client library from 0.29.1 to 1.2 and immediately started to see alerts.

What we started to see after this upgrade was spikes in the reported backlog (and oldest unacked message age). These happened hourly. However, our service appeared not to suffer, and continued to output its product at a steady rate.

Here is an overview of this upgrade as seen in Stackdriver (running Pubsub v1.2 highlighted in red, then after 10am Friday we reverted JUST the pubsub version and the process returned to normal):

Screen Shot 2020-01-11 at 11 22 37 AM

Zooming into approx Friday 12am until noon, and showing backlog at the top and oldest message at the bottom:

Screen Shot 2020-01-11 at 5 08 05 PM

It is pretty clearly something that happens every hour.

I know there's another Github issue for memory spikes, but at least as far as we can tell that's not the case for us. In fact, I don't think we actually saw a real impact on processing output. This assessment is based on: 1) we didn't see lag in out downstream client which is usually the case with actual backlogs and 2) we didn't see increase in worker cpu when the backlog recovered. The biggest problem is we use these alerts on this subscription as our main indicator that the process may be experiencing problems for some reason.

Environment details

  • OS: GKE
  • Node.js version: 12.x
  • npm version: 6.11.3
  • @google-cloud/pubsub version: 1.2

Steps to reproduce

Not sure we do anything special. Each worker instance creates a simple client which processes messages. We run this on GKE, with one node instance per pod. Approximately 64 workers are all pulling from the same subscription. We generally just ack messages regardless of successful processing because in this application it's ok to just drop them (a separate process will take care of it).

Hope this is helpful. Let me know if I can provide any additional data.

Thanks!

Metadata

Metadata

Assignees

Labels

🚨This issue needs some love.api: pubsubIssues related to the googleapis/nodejs-pubsub API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions