Skip to content

Unexpected behaviour with SDS / CDS APIs #431

@timperrett

Description

@timperrett

Hi gang,

I think there may be a few separate issues with the CDS API.

Overly frequent calling of SDS/CDS

The period timer in which Envoy is calling the CDS endpoint seems to be off by an order of magnitude. Given the following config:

"cluster_manager": {
    "sds": {
      "cluster": {
        "name": "sds",
        "connect_timeout_ms": 250,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://localhost:4000"
          }
        ]
      },
      "refresh_delay_ms": 30000
    },
    "cds": {
      "cluster": {
        "name": "cds",
        "connect_timeout_ms": 250,
        "type": "strict_dns",
        "lb_type": "round_robin",
        "hosts": [
          {
            "url": "tcp://localhost:4000"
          }
        ]
      },
      "refresh_delay_ms": 30000
    },
    "clusters": [ ]
  }

This should be 30 seconds between calls, but in the log (I have a local testing harness now to test the CDS/SDS stuff and our custom backend providing the contract Envoy needs):

2017/02/06 03:49:48 GET	/v1/clusters/service1/172.17.0.2	988.023µs
2017/02/06 03:49:49 GET	/v1/registration/consul	812.934µs
2017/02/06 03:50:23 GET	/v1/registration/consul	1.580317ms
2017/02/06 03:50:46 GET	/v1/clusters/service1/172.17.0.2	1.218456ms
2017/02/06 03:51:00 GET	/v1/registration/consul	1.185353ms
2017/02/06 03:51:32 GET	/v1/clusters/service1/172.17.0.2	1.422009ms
2017/02/06 03:51:52 GET	/v1/registration/consul	1.386849ms

Envoy appears to be calling these APIs rather frequently - perhaps something is miss configured?

Appending not replacing clusters internally?

  1. Perhaps i'm missing reading the output, but when looking locally at the Envoy (asking for /clusters, i'm not sure if the internal state of SDS/CDS is accurate as there are many duplicates. I've posted the output here: https://gist.github.com/timperrett/773c9c707b54077533ecf7144058cba1 - this to me at least looks as though the same cluster is in memory multiple times. My anticipation would be that there was only a single backend entry.

Thanks for your time. Happy to answer any questions on Gitter if this was not sufficient information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions