Skip to content

Support the Beats disk queue in Elastic Agent #3490

@cmacknz

Description

@cmacknz

The Beats today support a disk queue that has been GA for some time, however it cannot be used with the Elastic Agent. Part of the reason why is that Elastic Agent does not allow configuring the queue configuration at all, but this will change after elastic/beats#36693 is merged.

Those changes would allow a user to enable the Beats disk queue, which with no other changes would instruct each Beat to create a disk queue in the same directory. That is the disk queue is not shared between processes, there is a disk queue per process, and each per process disk queue will conflict attempting to use the same files in the same directory.

For the disk queue to work properly when running under the Elastic Agent without a dedicated shipper process we need to orchestrate the queue directories correctly in the agent itself. Specifically we need to:

  1. Create a dedicated directory in the agent installation path for the disk queue files. The natural choice for the disk queue location would be the per component run directory in the versioned data path, however this would require the entire queue to be copied on upgrade. I think we should avoid this because the disk queue can be large (100+ MB depending on configuration and usage), and instead created a dedicated outside of the versioned data path that is shared between versions of the Elastic Agent. We will likely need a file lock in the directory to ensure only one version can read from this directory at a time.

  2. In the dedicated queue directory, provision a unique disk queue sub-directory for each component since queues cannot be shared between processes. The disk queue for a component should be removed when the component is removed from the agent policy.

  3. Allow the user to configure the dedicated disk queue directory. Users may want the disk queue to reside on a dedicated volume, which will be particularly important when the Elastic Agent is running on Kubernetes and the user wishes for the disk queues to be stored on a persistent volume claim.

We will also need to performance test the Elastic Agent running with the disk queue, and compare it to the Elastic Agent without the disk queue. The disk queue has a performance penalty because events must be serialized before being written to disk. We should quantify what this penalty is, particularly when the Elastic Agent is supervising multiple Beats each with their own disk queue.

The final caveat to this implementation is that the disk queue will only be supported for inputs which are based on Beats. We should add the ability for agent specification files to declare whether they support the disk queue configuration. The one special case to consider is endpoint-security which always uses a disk queue that is different from the one implemented in Beats. We will need to make this obvious to users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions