discovery/aws: Fix AWS SDK v2 credentials handling for EC2 and Lightsail discovery#17355
Conversation
db64311 to
4ed0c01
Compare
sysadmind
left a comment
There was a problem hiding this comment.
LGTM. I wish we had a good environment to test with. Thanks!
bwplotka
left a comment
There was a problem hiding this comment.
Thanks for jumping in!
Should we merge it to release branch first? cc @krajorama
|
|
…ail discovery After the upgrade to AWS SDK v2, the EC2 and Lightsail service discovery stopped working when using the default AWS credential chain (environment variables, IAM roles, EC2 instance metadata, etc.). The issue was that the code unconditionally created a StaticCredentialsProvider with empty credentials when access_key and secret_key were not configured. In AWS SDK v2, this causes a "static credentials are empty" error and prevents the SDK from falling back to its default credential chain. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
4ed0c01 to
c40a574
Compare
|
I've built the image from the PR, and get these errors when running in EKS. With 3.6.0 I didn't get errors. I don't have any resources in my little test env so I didn't see targets either. Config map: |
|
I think we'll need someone to test this who knows what they are doing, I was just trying to do a dumb smoke test, see #17355 (comment) . cc @sysadmind |
|
LoadDefaultConfig should do the right thing for anything running in AWS. That seems to be all most projects need. The EBS CSI driver only uses that. https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/pkg/cloud/cloud.go#L391 I think the only reason we have to do anything custom is because we have our own config options to be passed in. |
|
Ok I was able to set up an AWS account to do some testing. TL;DR; This PR should resolve the issues. I created an EC2 instance to run docker and set up security groups so that I could access SSH and Prometheus ports. I created the following config file: scrape_configs:
- job_name: testsd
ec2_sd_configs:
- refresh_interval: 30s
access_key: aaaaaaa
secret_key: bbbbbbbbI ran this against v3.6.0 and everything worked correctly. Then I added an IAM role to the instance and removed the I changed the image to use v3.7.1 and started receiving an error that region was required. I opened #17375 to track this. After adding region to the config, I received the error that the original issue reported I built a new docker image with this PR and uploaded it to my EC2 instance. Running that image, there is no more error and the EC2 instance shows up in SD. I was able to get SD working with and without the |
krajorama
left a comment
There was a problem hiding this comment.
thank you @sysadmind for the tests
After the upgrade to AWS SDK v2, the EC2 and Lightsail service discovery stopped working when using the default AWS credential chain (environment variables, IAM roles, EC2 instance metadata, etc.).
The issue was that the code unconditionally created a StaticCredentialsProvider with empty credentials when access_key and secret_key were not configured. In AWS SDK v2, this causes a "static credentials are empty" error and prevents the SDK from falling back to its default credential chain.
Fixes #17343
Which issue(s) does the PR fix:
Does this PR introduce a user-facing change?