-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
Yandex Cloud providers are broken in some dual-stack environments due to grpc-go not fully supporting
To Reproduce
- Run ESO in a dual-stack environment with ULA IPv6 addresses, so IPv4 will be preferred
- Do not provide IPv4 connectivity to Yandex Cloud API – by having wrong routes or not allowing this traffic in network policies and other firewalls
- Create an Secret Store
- Notice Secret Store being ready
- Create an External Secret
Expected behavior
External Secret is successfully synced via IPv6
Screenshots
External Secret won't be synced due to timeouts via IPv4:
Warning UpdateFailed 3m31s (x17 over 9m3s) external-secrets error retrieving secret at .data[0], key: KEYID, err: unable to request secret payload to get secret: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 84.201.168.170:443: i/o timeout"
Here 84.201.168.170 is IPv4 address of lockbox-payload api endpoint:
❯ grpcurl -d '{"api_endpoint_id": "lockbox-payload"}' api.cloud.yandex.net:443 yandex.cloud.endpoint.ApiEndpointService/Get
{
"id": "lockbox-payload",
"address": "payload.lockbox.api.cloud.yandex.net:443"
}
❯ host payload.lockbox.api.cloud.yandex.net
payload.lockbox.api.cloud.yandex.net is an alias for public-dpl.lockbox.cloud.yandex.net.
public-dpl.lockbox.cloud.yandex.net has address 84.201.168.170
public-dpl.lockbox.cloud.yandex.net has IPv6 address 2a0d:d6c1:0:1c::1c6
Additional context
Yandex Cloud API is based on gRPC, which doesn't explicitly support dual-stack backends as of writing. There's a Proposal A61: IPv4 and IPv6 Dualstack Backend Support for it, which is not implemented in grpc-go yet – they're working on it right now: grpc/grpc-go#7498
However, notice how SecretStore is initialised successfully, lockbox/certificate manager endpoints are also discovered successfully, but API calls to them during external secret sync fail.
This is caused by ycsdk being used for API Calls for Secret Store initialisation and auth, and gRPC Client being used directly for calls to get the external secrets.
ycsdk uses deprecated DialContext call to create its gRPC Client:
https://github.com/yandex-cloud/go-sdk/blob/1018f7c96dc7bc49822d5fd96be72e8506ed0533/pkg/grpcclient/conn_context.go#L97
Which implicitly sets gRPC resolver to passthrough instead of default dns:
https://github.com/grpc/grpc-go/blob/005b092ca3c279e352f1247c4316b0351dec3a56/clientconn.go#L218-L222
While gRPC client in external-secrets is created via gRPC non-deprecated NewClient call:
external-secrets/pkg/provider/yandex/common/sdk.go
Lines 60 to 68 in 233ede3
| return grpc.NewClient(serviceAPIEndpoint.Address, | |
| grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)), | |
| grpc.WithKeepaliveParams(keepalive.ClientParameters{ | |
| Time: time.Second * 30, | |
| Timeout: time.Second * 10, | |
| PermitWithoutStream: false, | |
| }), | |
| grpc.WithUserAgent("external-secrets"), | |
| ) |
Short description of passthrough and dns resolvers:
passthroughresolver just passes addresses to load-balancer, it doesn't resolve them at alldnsresolver tries to discover configuration and endpoints viaTXT grpclbandSRV _grpc_config._tcpDNS records, and falls back to discovering IP addresses via A/AAAA records. The order ofA/AAAArecords returned is determined by Go's standard libraryLookupHost, which does RFC6724 sorting of returned addresses.
There addresses and configuration (if discovered) are passed to balancer, which is pick_first by default. pick_first should try all addresses serially in order given from resolver, but it does so with deadline for whole Dial being used on each attempt, which means that all time could be spent trying one family – which is not ideal and goes against best practices like Happy Eyeballs:
https://github.com/grpc/grpc-go/blob/2da976983bbb33feb3e25b7daaa8f60b9769adb5/clientconn.go#L1254-L1260
https://github.com/grpc/grpc-go/blob/2da976983bbb33feb3e25b7daaa8f60b9769adb5/clientconn.go#L1329-L1331
RFC6724 sorting prefers IPv4-to-IPv4 over ULA-to-GUA IPv6, so on dual-stack client with ULA IPv6 address IPv4 will be preferred – and in fact only IPv4 will be tried due to pick_first effectively only trying first address.
Notice that it goes other way too – if IPv6 has GUA but is silently broken, IPv4 will never be tried and connection won't be established.
ULA IPv6 addresses being given to pods instead of GUA is quite common in managed k8s.
You can test these methods with grpcurl – here I'm running them on a macOS with dual-stack networking having GUA IPv6 address, and explicitly setting Go resolver to native instead of CGO:
verbose logs for passthrough resolver
❯ GODEBUG=netdns=go+2 GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info grpcurl -d '{"api_endpoint_id": "lockbox-payload"}' passthrough:///api.cloud.yandex.net:443 yandex.cloud.endpoint.ApiEndpointService/Get
2024/08/27 10:06:05 INFO: [core] [Channel #1] Channel created
2024/08/27 10:06:05 INFO: [core] [Channel #1] original dial target is: "passthrough:///api.cloud.yandex.net:443"
2024/08/27 10:06:05 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"passthrough", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/api.cloud.yandex.net:443", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2024/08/27 10:06:05 INFO: [core] [Channel #1] Channel authority set to "api.cloud.yandex.net:443"
2024/08/27 10:06:05 INFO: [core] [Channel #1] Resolver state updated: {
"Addresses": [
{
"Addr": "api.cloud.yandex.net:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "api.cloud.yandex.net:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses)
2024/08/27 10:06:05 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2024/08/27 10:06:05 INFO: [core] [pick-first-lb 0x14000335890] Received new config {
"shuffleAddressList": false
}, resolver state {
"Addresses": [
{
"Addr": "api.cloud.yandex.net:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "api.cloud.yandex.net:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
}
2024/08/27 10:06:05 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2024/08/27 10:06:05 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING
2024/08/27 10:06:05 INFO: [core] [Channel #1] Channel exiting idle mode
2024/08/27 10:06:05 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
2024/08/27 10:06:05 INFO: [core] [Channel #1 SubChannel #2] Subchannel picks a new address "api.cloud.yandex.net:443" to connect
2024/08/27 10:06:05 INFO: [core] [pick-first-lb 0x14000335890] Received SubConn state update: 0x14000335a10, {ConnectivityState:CONNECTING ConnectionError:<nil>}
go package net: confVal.netCgo = false netGo = true
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(api.cloud.yandex.net) = files,dns
2024/08/27 10:06:06 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY
2024/08/27 10:06:06 INFO: [core] [pick-first-lb 0x14000335890] Received SubConn state update: 0x14000335a10, {ConnectivityState:READY ConnectionError:<nil>}
2024/08/27 10:06:06 INFO: [core] [Channel #1] Channel Connectivity change to READY
{
"id": "lockbox-payload",
"address": "payload.lockbox.api.cloud.yandex.net:443"
}
2024/08/27 10:06:06 INFO: [core] [Channel #1] Channel Connectivity change to SHUTDOWN
2024/08/27 10:06:06 INFO: [core] [Channel #1] Closing the name resolver
2024/08/27 10:06:06 INFO: [core] [Channel #1] ccBalancerWrapper: closing
2024/08/27 10:06:06 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN
2024/08/27 10:06:06 INFO: [core] [Channel #1 SubChannel #2] Subchannel deleted
2024/08/27 10:06:06 INFO: [transport] [client-transport 0x140003bc008] Closing: rpc error: code = Canceled desc = grpc: the client connection is closing
2024/08/27 10:06:06 INFO: [transport] [client-transport 0x140003bc008] loopyWriter exiting with error: transport closed by client
2024/08/27 10:06:06 INFO: [core] [Channel #1] Channel deleted
verbose log for dns resolver with GUA
❯ GODEBUG=netdns=go+2 GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info grpcurl -d '{"api_endpoint_id": "lockbox-payload"}' dns:///api.cloud.yandex.net:443 yandex.cloud.endpoint.ApiEndpointService/Get
2024/08/27 10:06:41 INFO: [core] [Channel #1] Channel created
2024/08/27 10:06:41 INFO: [core] [Channel #1] original dial target is: "dns:///api.cloud.yandex.net:443"
2024/08/27 10:06:41 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"dns", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/api.cloud.yandex.net:443", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2024/08/27 10:06:41 INFO: [core] [Channel #1] Channel authority set to "api.cloud.yandex.net:443"
2024/08/27 10:06:41 INFO: [core] [Channel #1] Channel exiting idle mode
go package net: confVal.netCgo = false netGo = true
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(api.cloud.yandex.net) = files,dns
2024/08/27 10:06:43 INFO: [core] [Channel #1] Resolver state updated: {
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
},
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
},
{
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses)
2024/08/27 10:06:43 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2024/08/27 10:06:43 INFO: [core] [pick-first-lb 0x140000caa80] Received new config {
"shuffleAddressList": false
}, resolver state {
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
},
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
},
{
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
}
2024/08/27 10:06:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2024/08/27 10:06:43 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING
2024/08/27 10:06:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
2024/08/27 10:06:43 INFO: [core] [Channel #1 SubChannel #2] Subchannel picks a new address "[2a0d:d6c1:0:1c::4e]:443" to connect
2024/08/27 10:06:43 INFO: [core] [pick-first-lb 0x140000caa80] Received SubConn state update: 0x140000cac00, {ConnectivityState:CONNECTING ConnectionError:<nil>}
2024/08/27 10:06:44 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY
2024/08/27 10:06:44 INFO: [core] [pick-first-lb 0x140000caa80] Received SubConn state update: 0x140000cac00, {ConnectivityState:READY ConnectionError:<nil>}
2024/08/27 10:06:44 INFO: [core] [Channel #1] Channel Connectivity change to READY
{
"id": "lockbox-payload",
"address": "payload.lockbox.api.cloud.yandex.net:443"
}
2024/08/27 10:06:47 INFO: [core] [Channel #1] Channel Connectivity change to SHUTDOWN
2024/08/27 10:06:47 INFO: [core] [Channel #1] Closing the name resolver
2024/08/27 10:06:47 INFO: [core] [Channel #1] ccBalancerWrapper: closing
2024/08/27 10:06:47 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN
2024/08/27 10:06:47 INFO: [core] [Channel #1 SubChannel #2] Subchannel deleted
2024/08/27 10:06:47 INFO: [transport] [client-transport 0x140001f1b08] Closing: rpc error: code = Canceled desc = grpc: the client connection is closing
2024/08/27 10:06:47 INFO: [transport] [client-transport 0x140001f1b08] loopyWriter exiting with error: transport closed by client
2024/08/27 10:06:47 INFO: [core] [Channel #1] Channel deleted
and here I'm running DNS resolver on macOS with non-GUA IPv6 addresses – notice how IPv4/IPv6 order changes:
dns resolver with non-GUA IPv6 address
❯ GODEBUG=netdns=go+2 GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info grpcurl -d '{"api_endpoint_id": "lockbox-payload"}' dns:///api.cloud.yandex.net:443 yandex.cloud.endpoint.ApiEndpointService/Get
2024/08/27 10:07:29 INFO: [core] [Channel #1] Channel created
2024/08/27 10:07:29 INFO: [core] [Channel #1] original dial target is: "dns:///api.cloud.yandex.net:443"
2024/08/27 10:07:29 INFO: [core] [Channel #1] parsed dial target is: resolver.Target{URL:url.URL{Scheme:"dns", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/api.cloud.yandex.net:443", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}}
2024/08/27 10:07:29 INFO: [core] [Channel #1] Channel authority set to "api.cloud.yandex.net:443"
2024/08/27 10:07:29 INFO: [core] [Channel #1] Channel exiting idle mode
go package net: confVal.netCgo = false netGo = true
go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(api.cloud.yandex.net) = files,dns
2024/08/27 10:07:30 INFO: [core] [Channel #1] Resolver state updated: {
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
},
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
},
{
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
} (resolver returned new addresses)
2024/08/27 10:07:30 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2024/08/27 10:07:30 INFO: [core] [pick-first-lb 0x140001111a0] Received new config {
"shuffleAddressList": false
}, resolver state {
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
},
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Endpoints": [
{
"Addresses": [
{
"Addr": "84.201.181.26:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
},
{
"Addresses": [
{
"Addr": "[2a0d:d6c1:0:1c::4e]:443",
"ServerName": "",
"Attributes": null,
"BalancerAttributes": null,
"Metadata": null
}
],
"Attributes": null
}
],
"ServiceConfig": null,
"Attributes": null
}
2024/08/27 10:07:30 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2024/08/27 10:07:30 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING
2024/08/27 10:07:30 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
2024/08/27 10:07:30 INFO: [core] [Channel #1 SubChannel #2] Subchannel picks a new address "84.201.181.26:443" to connect
2024/08/27 10:07:30 INFO: [core] [pick-first-lb 0x140001111a0] Received SubConn state update: 0x14000111320, {ConnectivityState:CONNECTING ConnectionError:<nil>}
2024/08/27 10:07:30 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY
2024/08/27 10:07:30 INFO: [core] [pick-first-lb 0x140001111a0] Received SubConn state update: 0x14000111320, {ConnectivityState:READY ConnectionError:<nil>}
2024/08/27 10:07:30 INFO: [core] [Channel #1] Channel Connectivity change to READY
{
"id": "lockbox-payload",
"address": "payload.lockbox.api.cloud.yandex.net:443"
}
2024/08/27 10:07:31 INFO: [core] [Channel #1] Channel Connectivity change to SHUTDOWN
2024/08/27 10:07:31 INFO: [core] [Channel #1] Closing the name resolver
2024/08/27 10:07:31 INFO: [core] [Channel #1] ccBalancerWrapper: closing
2024/08/27 10:07:31 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN
2024/08/27 10:07:31 INFO: [core] [Channel #1 SubChannel #2] Subchannel deleted
2024/08/27 10:07:31 INFO: [transport] [client-transport 0x14000193b08] Closing: rpc error: code = Canceled desc = grpc: the client connection is closing
2024/08/27 10:07:31 INFO: [transport] [client-transport 0x14000193b08] loopyWriter exiting with error: transport closed by client
2024/08/27 10:07:31 INFO: [core] [Channel #1] Channel deleted
Possible solutions
- Set resolver to
passthroughexplicitly, while keeping current code and architecture of Yandex Cloud secret stores – provided in fix: set grpc resolver explicitly in yandex #3838 - Do not use grpc directly in any way and only use
ycsdkcalls and methods