Skip to content

AWS SDK IMDS lookups cause 3-second delay in log-ingestor job creation #1915

@junhaoliao

Description

@junhaoliao

Bug

When creating an S3 scanner ingestion job via the log-ingestor API, the AWS SDK attempts to load region information from EC2 Instance Metadata Service (IMDS), even though the region is explicitly provided in the request. This causes:

  1. 3-second delay during job creation (3 x 1-second IMDS timeout retries)
  2. Spurious warning logs that may confuse operators:
    {"level":"WARN","message":"failed to load region from IMDS","err":"failed to load IMDS session token: dispatch failure: timeout: client error (Connect): HTTP connect timeout occurred after 1s: timed out"}
    

Expected behavior: When region and credentials are explicitly provided, the AWS SDK should not attempt to load configuration from IMDS or other environment sources.

Root Cause

let base_config = aws_config::defaults(BehaviorVersion::latest()).load().await;

The load().await call triggers the AWS SDK's default credential/region provider chain, which includes IMDS as a fallback. In non-EC2 environments, IMDS is unreachable, causing timeout retries.

Potential fix

Build the S3/SQS client configuration directly without using the default environment provider chain, since credentials and region are already explicitly provided by the caller.

See: https://docs.rs/aws-config/latest/aws_config/struct.ConfigLoader.html

Overriding a component will skip the standard resolution chain from for that component.

Then, in theory, this should get rid of the environment variable provider chain:

    let config = aws_config::defaults(BehaviorVersion::latest())
        .credentials_provider(credentials)
        .region(Region::new(region_id.to_string()))
        .load()
        .await;

    let mut s3_config_builder = Builder::from(&config).force_path_style(true);
    s3_config_builder.set_endpoint_url(endpoint.map(std::string::ToString::to_string));
    Client::from_conf(s3_config_builder.build())

CLP version

01d35a6

Environment

  • Kubernetes cluster (local, non-EC2 environment)
  • Helm chart deployment with logs_input.type: "s3"
  • log-ingestor pod running

Reproduction steps

  1. Deploy CLP using the Helm chart with S3 logs input:

    clpConfig:
      logs_input:
        type: "s3"
        aws_authentication:
          type: "credentials"
          credentials:
            access_key_id: "<your-access-key>"
            secret_access_key: "<your-secret-key>"
  2. Create an S3 scanner ingestion job:

    curl --location 'localhost:30302/s3_scanner' \
      --header 'Content-Type: application/json' \
      --data '{
        "bucket_name": "your-bucket",
        "key_prefix": "/",
        "region": "us-east-2"
      }'
  3. Observe the log-ingestor logs:

    kubectl logs -f <log-ingestor-pod>
  4. Notice the IMDS timeout warnings and the ~3-second delay before the job is created.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions