Skip to content

Figure out how to performance test the aws-s3 input in polling mode #77

@zmoog

Description

@zmoog

I want to run performance tests on the aws-s3 input using the Elastic Agent 8.12.0 to ingest Cloudtrail logs.

The test will run on the following EC2 instance.

Instance type Architecture vCPU Memory (GiB) Network Bandwidth (Gbps)
c7g.2xlarge arm64 8 16 Up to 15

The goals are:

  • Measure performance

Authentication

We will use an EC2 instance profile only, so there will be no authentication-related options in the integration settings.

Dataset

As a test dataset, I will use an S3 bucket containing 1.2M objects of Cloudtrail logs.

I set up the integration to process only S3 object files created in 2024 (301254 objects at the time of the test). To achieve this, I updated the file selector regex to match the following expression:

/CloudTrail/[a-z]{2}-[a-z]+-\d+/2024/.*$

Here is the agent policy:

  - id: aws-s3-cloudtrail-fb93f962-4474-4f15-9032-0473c01ae54b
    name: aws-21
    revision: 12
    type: aws-s3
    use_output: 6a7e2784-665c-4208-afac-380bac62351e
    meta:
      package:
        name: aws
        version: 2.8.5
    data_stream:
      namespace: sdh3988.3
    package_policy_id: fb93f962-4474-4f15-9032-0473c01ae54b
    streams:
      - id: aws-s3-aws.cloudtrail-fb93f962-4474-4f15-9032-0473c01ae54b
        data_stream:
          dataset: aws.cloudtrail
          type: logs
        bucket_arn: '<REDACTED>'
        number_of_workers: 100
        bucket_list_interval: 1m
        file_selectors:
          - regex: '/CloudTrail/[a-z]{2}-[a-z]+-\d+/2024/.*$'
            expand_event_list_from_field: Records
          - regex: '/CloudTrail-Digest/[a-z]{2}-[a-z]+-\d+/2024/.*$'
          - regex: '/CloudTrail-Insight/[a-z]{2}-[a-z]+-\d+/2024/.*$'
            expand_event_list_from_field: Records
        expand_event_list_from_field: Records
        content_type: application/json
        tags:
          - forwarded
          - aws-cloudtrail
        publisher_pipeline.disable_host: true

Test process

  • Set up the agent policy to use a new data stream on each run.
  • Remove the state registry (unassigning the agent policy to an agent clears up the state registry)
  • Start the Agent

Metadata

Metadata

Assignees

Labels

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions