Skip to content

Elastic Agent is waiting for initial configuration forever when all providers are disabled #4648

@eyalkraft

Description

@eyalkraft

In case of no enabled providers, the elastic agent stalls forever.
It seems to be a bug here:

for {
DEBOUNCE:
for {
select {
case <-ctx.Done():
cleanupFn()
return ctx.Err()
case <-stateChangedChan:
t.Reset(100 * time.Millisecond)
c.logger.Debugf("Variable state changed for composable inputs; debounce started")
drainChan(stateChangedChan)
break DEBOUNCE
}
}
// notification received, wait for batch

where there is no way to break to DEBOUNCE if no provider is updating.

A possible solution could be based on the providers config length or the waiting group which should return immediately in this case since if there are no providers it is equal to 0.

wg.Add(len(c.contextProviders) + len(c.dynamicProviders))

The bug was discovered as part of the work on agentless controller. Currently we use a workaround to solve this issue.

Bug details:

  • Version: 8.13.2
  • Operating System: all
  • Steps to Reproduce:

Setup (agent-bug is the directory name which shows up before every command)

➜  test cd agent-bug
➜  agent-bug docker pull docker.elastic.co/beats/elastic-agent:8.13.2
8.13.2: Pulling from beats/elastic-agent
c93e5d1261d3: Pull complete
9204e5b4f4d9: Pull complete
77db57972e9d: Pull complete
5a09faecb150: Pull complete
ef1d475c705b: Pull complete
9be0bc4d4489: Pull complete
97bac83776bc: Pull complete
594c586edc1b: Pull complete
b45e1922fc73: Pull complete
a9c1a4bc09dd: Pull complete
83a50bca82ec: Pull complete
4ca545ee6d5d: Pull complete
Digest: sha256:1b1346f6228c4cfcc8bd6b05e0eb24f15bcfd616935d0f2fffe0754d7d3fe31b
Status: Downloaded newer image for docker.elastic.co/beats/elastic-agent:8.13.2
docker.elastic.co/beats/elastic-agent:8.13.2

Get the original config file

➜  agent-bug docker run --rm -d --name elastic-agent docker.elastic.co/beats/elastic-agent:8.13.2 container
76af237adeb219eb7591c6b7647c3bfe523e4d73121bf1677388720b79e29d85
➜  agent-bug docker cp elastic-agent:/usr/share/elastic-agent/elastic-agent.yml ./
➜  agent-bug docker stop elastic-agent

Modify it to disable all providers

➜  agent-bug cat <<EOF >> elastic-agent.yml
providers:
  agent:
    enabled: false
  docker:
    enabled: false
  env:
    enabled: false
  host:
    enabled: false
  kubernetes:
    enabled: false
  kubernetes_leaderelection:
    enabled: false
  kubernetes_secrets:
    enabled: false
  local:
    enabled: false
  local_dynamic:
    enabled: false
  path:
    enabled: false
EOF

Start the agent with the modified config

➜  agent-bug docker run --rm -d --name elastic-agent -v ./elastic-agent.yml:/usr/share/elastic-agent/elastic-agent.yml docker.elastic.co/beats/elastic-agent:8.13.2 container
decb5e6e6d443e11d228a9ff32e8a9ad3ed78f466b7c830e85aec0a3818b9aa4

Enter the container

➜  agent-bug docker exec -it elastic-agent /bin/bash

Agent stuck on waiting for initial configuration

elastic-agent@decb5e6e6d44:~$ elastic-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   └─ status: (STARTING) Waiting for initial configuration and composable variables

... waiting ...

elastic-agent@decb5e6e6d44:~$ elastic-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   └─ status: (STARTING) Waiting for initial configuration and composable variables

Agent configuration (for some reason inspect stalls so I have to kill it)

elastic-agent@decb5e6e6d44:~$ elastic-agent inspect
agent:
  logging:
    to_stderr: true
inputs:
- data_stream.namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream.dataset: system.cpu
    metricsets:
    - cpu
  - data_stream.dataset: system.memory
    metricsets:
    - memory
  - data_stream.dataset: system.network
    metricsets:
    - network
  - data_stream.dataset: system.filesystem
    metricsets:
    - filesystem
  type: system/metrics
  use_output: default
outputs:
  default:
    hosts: http://elasticsearch:9200
    password: changeme
    preset: balanced
    type: elasticsearch
    username: elastic
providers:
  agent:
    enabled: false
  docker:
    enabled: false
  env:
    enabled: false
  host:
    enabled: false
  kubernetes:
    enabled: false
  kubernetes_leaderelection:
    enabled: false
  kubernetes_secrets:
    enabled: false
  local:
    enabled: false
  local_dynamic:
    enabled: false
  path:
    enabled: false

^CError: could not load agent info: could not get agent info from store: failed to load from ioStore: failed to ensure key during encrypted disk store Load: could not get agent key: failed to acquire exclusive lock: /usr/share/elastic-agent/state/vault/.lock, err: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.13/fleet-troubleshooting.html
elastic-agent@decb5e6e6d44:~$ exit

Cleanup

➜  agent-bug docker stop elastic-agent
elastic-agent

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions