In case of no enabled providers, the elastic agent stalls forever.
It seems to be a bug here:
|
for { |
|
DEBOUNCE: |
|
for { |
|
select { |
|
case <-ctx.Done(): |
|
cleanupFn() |
|
return ctx.Err() |
|
case <-stateChangedChan: |
|
t.Reset(100 * time.Millisecond) |
|
c.logger.Debugf("Variable state changed for composable inputs; debounce started") |
|
drainChan(stateChangedChan) |
|
break DEBOUNCE |
|
} |
|
} |
|
|
|
// notification received, wait for batch |
where there is no way to break to
DEBOUNCE if no provider is updating.
A possible solution could be based on the providers config length or the waiting group which should return immediately in this case since if there are no providers it is equal to 0.
|
wg.Add(len(c.contextProviders) + len(c.dynamicProviders)) |
The bug was discovered as part of the work on agentless controller. Currently we use a workaround to solve this issue.
Bug details:
- Version: 8.13.2
- Operating System: all
- Steps to Reproduce:
Setup (agent-bug is the directory name which shows up before every command)
➜ test cd agent-bug
➜ agent-bug docker pull docker.elastic.co/beats/elastic-agent:8.13.2
8.13.2: Pulling from beats/elastic-agent
c93e5d1261d3: Pull complete
9204e5b4f4d9: Pull complete
77db57972e9d: Pull complete
5a09faecb150: Pull complete
ef1d475c705b: Pull complete
9be0bc4d4489: Pull complete
97bac83776bc: Pull complete
594c586edc1b: Pull complete
b45e1922fc73: Pull complete
a9c1a4bc09dd: Pull complete
83a50bca82ec: Pull complete
4ca545ee6d5d: Pull complete
Digest: sha256:1b1346f6228c4cfcc8bd6b05e0eb24f15bcfd616935d0f2fffe0754d7d3fe31b
Status: Downloaded newer image for docker.elastic.co/beats/elastic-agent:8.13.2
docker.elastic.co/beats/elastic-agent:8.13.2
Get the original config file
➜ agent-bug docker run --rm -d --name elastic-agent docker.elastic.co/beats/elastic-agent:8.13.2 container
76af237adeb219eb7591c6b7647c3bfe523e4d73121bf1677388720b79e29d85
➜ agent-bug docker cp elastic-agent:/usr/share/elastic-agent/elastic-agent.yml ./
➜ agent-bug docker stop elastic-agent
Modify it to disable all providers
➜ agent-bug cat <<EOF >> elastic-agent.yml
providers:
agent:
enabled: false
docker:
enabled: false
env:
enabled: false
host:
enabled: false
kubernetes:
enabled: false
kubernetes_leaderelection:
enabled: false
kubernetes_secrets:
enabled: false
local:
enabled: false
local_dynamic:
enabled: false
path:
enabled: false
EOF
Start the agent with the modified config
➜ agent-bug docker run --rm -d --name elastic-agent -v ./elastic-agent.yml:/usr/share/elastic-agent/elastic-agent.yml docker.elastic.co/beats/elastic-agent:8.13.2 container
decb5e6e6d443e11d228a9ff32e8a9ad3ed78f466b7c830e85aec0a3818b9aa4
Enter the container
➜ agent-bug docker exec -it elastic-agent /bin/bash
Agent stuck on waiting for initial configuration
elastic-agent@decb5e6e6d44:~$ elastic-agent status
┌─ fleet
│ └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
└─ status: (STARTING) Waiting for initial configuration and composable variables
... waiting ...
elastic-agent@decb5e6e6d44:~$ elastic-agent status
┌─ fleet
│ └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
└─ status: (STARTING) Waiting for initial configuration and composable variables
Agent configuration (for some reason inspect stalls so I have to kill it)
elastic-agent@decb5e6e6d44:~$ elastic-agent inspect
agent:
logging:
to_stderr: true
inputs:
- data_stream.namespace: default
id: unique-system-metrics-input
streams:
- data_stream.dataset: system.cpu
metricsets:
- cpu
- data_stream.dataset: system.memory
metricsets:
- memory
- data_stream.dataset: system.network
metricsets:
- network
- data_stream.dataset: system.filesystem
metricsets:
- filesystem
type: system/metrics
use_output: default
outputs:
default:
hosts: http://elasticsearch:9200
password: changeme
preset: balanced
type: elasticsearch
username: elastic
providers:
agent:
enabled: false
docker:
enabled: false
env:
enabled: false
host:
enabled: false
kubernetes:
enabled: false
kubernetes_leaderelection:
enabled: false
kubernetes_secrets:
enabled: false
local:
enabled: false
local_dynamic:
enabled: false
path:
enabled: false
^CError: could not load agent info: could not get agent info from store: failed to load from ioStore: failed to ensure key during encrypted disk store Load: could not get agent key: failed to acquire exclusive lock: /usr/share/elastic-agent/state/vault/.lock, err: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.13/fleet-troubleshooting.html
elastic-agent@decb5e6e6d44:~$ exit
Cleanup
➜ agent-bug docker stop elastic-agent
elastic-agent
In case of no enabled providers, the elastic agent stalls forever.
It seems to be a bug here:
elastic-agent/internal/pkg/composable/controller.go
Lines 188 to 203 in 0c7212f
where there is no way to break to
DEBOUNCEif no provider is updating.A possible solution could be based on the providers config length or the waiting group which should return immediately in this case since if there are no providers it is equal to 0.
elastic-agent/internal/pkg/composable/controller.go
Line 128 in 0c7212f
The bug was discovered as part of the work on agentless controller. Currently we use a workaround to solve this issue.
Bug details:
Setup (
agent-bugis the directory name which shows up before every command)Get the original config file
Modify it to disable all providers
Start the agent with the modified config
Enter the container
Agent stuck on waiting for initial configuration
... waiting ...
Agent configuration (for some reason inspect stalls so I have to kill it)
Cleanup