Panic mode on cluster membership changes?

*Description*:
We are investigating some odd panic routing metrics that envoy is emitting. As far as we can tell nothing is actually _wrong_ with our request pipelines, but we want to trace this down as its causing spurious alerts.

We can clearly correlate cluster membership changes:

![image](https://user-images.githubusercontent.com/42393871/56398377-0630ea80-61fd-11e9-8855-513321ede936.png)

With panic routing

![image](https://user-images.githubusercontent.com/42393871/56398385-0e892580-61fd-11e9-9d47-212e3b3b7f4c.png)

Our stats all say that all attempted health checks passed, there are no failures:


![image](https://user-images.githubusercontent.com/42393871/56398768-34afc500-61ff-11e9-9ae5-31ec02bc2253.png)


We added health check event logging and can see that we get

```
add_healthy_event:	{	
   first_check:	 true	
}	
```
When the cluster membership changes.  

I was opening this issue to see if there was any insight as to what could be causing this, or suggestions into debugging attempts.

Our relevant cluster configuration looks like:

```
"dynamic_active_clusters": [
    {
     "version_info": "d63a8ee91ca7f647e623c3c5113a61d62be6fc23e09dbd2b73a7dc85a2e50e37",
     "cluster": {
      "name": "internal_cluster",
      "type": "STRICT_DNS",
      "connect_timeout": "2s",
      "health_checks": [
       {
        "timeout": "3s",
        "interval": "4s",
        "unhealthy_threshold": 2,
        "healthy_threshold": 2,
        "http_health_check": {
         "path": "/healthcheck"
        },
        "no_traffic_interval": "4s",
        "event_log_path": "/var/log/envoy_health_event.log"
       }      
      "http2_protocol_options": {},
      "upstream_connection_options": {
       "tcp_keepalive": {
        "keepalive_time": 120
       }
      },
      "load_assignment": {
       "cluster_name": "apiori",
       "endpoints": [
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "def.dns.entry",
              "port_value": 10652
             }
            }
           },
           "load_balancing_weight": 50
          },
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "abc.dns.entry",
              "port_value": 10652
             }
            }
           },
           "load_balancing_weight": 50
          }
         ]
        },
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "xyz.dns.entry",
              "port_value": 10652
             }
            }
           },
           "load_balancing_weight": 100
          }
         ],
         "priority": 1
        },
        {
         "lb_endpoints": [
          {
           "endpoint": {
            "address": {
             "socket_address": {
              "address": "xyz.dns.entry",
              "port_value": 10652
             }
            }
           },
           "load_balancing_weight": 100
          }
         ],
         "priority": 2
        }
       ],
       "policy": {
        "overprovisioning_factor": 198
       }
      }
     },
     "last_updated": "2019-04-18T21:13:33.619Z"
    },
```

When we do cluster changes via a connected CDS we update the clusters LB endpoints to point to different dns entries, but otherwise everything stays the same.

We are on envoy version:  ```envoy 0/1.9.0-dev//RELEASE live 1394162 3549119 1```
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic mode on cluster membership changes? #6653

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Panic mode on cluster membership changes? #6653

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions