Skip to content

Bug Report: Throttler config updates are not seen by vttablet after topo watch error #18209

@brendar

Description

@brendar

Overview of the Issue

When the throttler on vttablet encounters a topo watch error (e.g. zookeeper closing the connection) it doesn’t recreate the watch on the SrvKeyspace, so it never gets any more throttler config updates.

In the vttablet logs we see this error message:

E0425 10:45:01.036900   40711 throttler.go:381] WatchSrvKeyspaceCallback error: ResilientWatch stream failed for zone1.commerce: zk: zookeeper is closing

That comes from here:

if err != nil {
if !topo.IsErrType(err, topo.Interrupted) && !errors.Is(err, context.Canceled) {
log.Errorf("WatchSrvKeyspaceCallback error: %v", err)
}
return false
}

The problem is that callback function returns false when called with an error, and it looks like when a callback function returns false it effectively gets removed from the list of listeners here:

listeners := entry.listeners
entry.listeners = entry.listeners[:0]
for _, callback := range listeners {
if callback(entry.value, entry.lastError) {
entry.listeners = append(entry.listeners, callback)
}
}

Reproduction Steps

I can reproduce this easily using zookeeper as the topo server because connection errors get surfaced as watch errors. I'm not sure how to force a watch error with etcd.

  • Start a cluster using zookeeper as the topo server
    • TOPO=zk2 ./101_initial_cluster.sh
  • Tail the vttablet logs
  • Enable the throttler:
    • vtctldclient --server localhost:15999 UpdateThrottlerConfig --enable --threshold 1.0 commerce
    • After a few seconds you should see something like this in the vttablet logs:
      I0425 10:43:45.794495   40711 throttler.go:425] Throttler: applying topo config: enabled:true threshold:1
      I0425 10:43:45.794520   40711 throttler.go:531] Throttler: enabling
      
  • Stop zookeeper
    • TOPO=zk2 CELL=zone1 ../common/scripts/zk-down.sh
    • You should see something like this in the vttablet logs
      E0425 10:45:01.033478   40711 watch.go:211] ResilientWatch stream failed for zone1.commerce: zk: zookeeper is closing
      received a non-OK event for /vitess/zone1/keyspaces/commerce/SrvKeyspace
      E0425 10:45:01.036900   40711 throttler.go:381] WatchSrvKeyspaceCallback error: ResilientWatch stream failed for zone1.commerce: zk: zookeeper is closing
      
  • Start zookeeper
    • TOPO=zk2 CELL=zone1 ../common/scripts/zk-up.sh
  • Modify the throttler config:
    • vtctldclient --server localhost:15999 UpdateThrottlerConfig --enable --threshold 2.0 commerce
    • The change will never be seen by the vttablet

Binary Version

Impacts main and earlier versions

Operating System and Environment details

N/A

Log Fragments

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions