Skip to content

[BUG] etcd authentication performance issue and registry cache penetration #2821

@DingYuan0118

Description

@DingYuan0118

Describe the bug

The backgound is that we need to enable etcd server auth due to some security concerns.

Etcd server has a special authentication design which causes a serious performance problem of /etcdserverpb.Auth/Authenticate API.

From our observation, a normal 3-nodes etcd cluster whose spec like 64C 256G HDD can only support less than around 100 QPS for authentication requests.

In current default go-micro registry plugin setting and gRPC server setting, gRPC server will do a registry by using KeepAliveOnce every RegisterInterval(default 30s). KeepAliveOnce will use /etcdserverpb.Auth/Authenticate once to establish the stream.

In our Prod env, we have a k8s cluster which contains over 4000 service pods results in a steady /etcdserverpb.Auth/Authenticate QPS around 110.

When we start to enable the etcd auth, the etcd cluster can not handle such /etcdserverpb.Auth/Authenticate QPS causes the business KeepAliveOnce failed and deregister from etcd server after registryTTL.

The upstream Watch the change and Delete downstream server nodes from registry cache which result in an empty cache finally.

When the cache has been cleared, causes another cache penetration problem which is that a gRPC call will call etcd to get downstream nodes from etcd when cache is empty or invalid, but etcd server does not have the info in this time because downstream can not keep the registry hearbeat due to the /etcdserverpb.Auth/Authenticate problem.

The result is that all gRPC request penetrate to etcd and failed at last.

We wanna handle these two problems:

    1. Limit the request to etcd when cache is empty to avoid the penetration issue.
    1. Use "KeepAlive" instead of "KeepAliveOnce" to address /etcdserverpb.Auth/Authenticate QPS issue.

To Reproduce

Steps to reproduce the behavior:

  1. create a 4000+ service pods which use default go-micro registry settings.
  2. preconfigure the etcd username and passwd.
  3. enable etcd cluster auth.

Environment

  • Go Micro version:
    • github.com/go-micro/plugins/v4/client/grpc v1.2.1
    • github.com/go-micro/plugins/v4/registry/etcd v1.2.0
    • github.com/go-micro/plugins/v4/server/grpc v1.2.0
    • go-micro.dev/v4 v4.9.0
    • go.etcd.io/etcd/client/v3 v3.5.2
  • Go version: 1.18
  • OS: Ubuntu 20.04
  • Plugins used:
    • etcd registry

Logs

server side monitor

Image

Resources

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions