Kibana version:
8.16.1
Elasticsearch version:
8.16.1
Server OS version:
ELK/Fleet: Debian/Ubuntu
Agents: Windows Server 2019, 2022
Original install method (e.g. download page, yum, from source, etc.):
On-prem, Debian package
Describe the bug:
Since Version 8.16.0 Agents that have been offline for some time show up as "unenrolled" in Kibana (mouseover Information-Icon) and cannot be upgraded via fleet anymore.
Steps to reproduce:
- Upgrade Elasticsearch, Kibana, Fleet, Agents from 8.15.4 -> 8.16.0 -> 8.16.1
- Stop Elastic-Agent Service in Windows for 1-2 days (we didn't stop it manually - it just didn't start automatically after patchday reboot)
- Start Elastic-Agent Service again
Expected behavior:
Agent should still show up enrolled in Fleet and be upgradeable even if being offline for a few days.
How is it even possible that "healthy" agents be "unenrolled" at the same time?
Screenshots (if relevant):

Errors in browser console (if relevant):
Failed to load resource: the server responded with a status of 500 (Internal Server Error)
:5601/api/fleet/agents?page=1&perPage=20&kuery=status%3Aonline%20or%20(status%3Aerror%20or%20status%3Adegraded)%20or%20(status%3Aupdating%20or%20status%3Aunenrolling%20or%20status%3Aenrolling)%20or%20status%3Aoffline&sortField=enrolled_at&sortOrder=desc&showInactive=false&showUpgradeable=false&getStatusSummary=true&withMetrics=true:1
Provide logs and/or server output (if relevant):
elastic-agent version:
Binary: 8.16.1 (build: b6da7f8ebb1d0d06c1f1929dfed8458708a5bedf at 2024-11-19 02:02:29 +0000 UTC)
Daemon: 8.16.1 (build: b6da7f8ebb1d0d06c1f1929dfed8458708a5bedf at 2024-11-19 02:02:29 +0000 UTC)
Agent-Log (example host) from day of alleged "unenrollment" (shutdown caused by a server reboot):
Nov 30, 2024
20:00:49.673 elastic_agent [elastic_agent][info] signal "terminated" received
20:00:49.673 elastic_agent [elastic_agent][info] Shutting down Elastic Agent and sending last events...
20:00:49.680 elastic_agent [elastic_agent][warn] Possible transient error during checkin with fleet-server, retrying
20:00:49.687 elastic_agent [elastic_agent][error] failed accept conn info connection: use of closed network connection
20:00:49.687 elastic_agent [elastic_agent][info] stopping endpoint service runtime
20:00:49.881 elastic_agent [elastic_agent][info] Shutting down completed.
20:00:49.881 elastic_agent [elastic_agent][info] Stopping monitoring server
20:00:49.882 elastic_agent [elastic_agent][info] Stats endpoint (127.0.0.1:6791) finished: accept tcp 127.0.0.1:6791: use of closed network connection
Agent-Log (example host) from day of turning ElasticAgent-Service manually back on:
Dec 2, 2024
09:17:31.073 elastic_agent [elastic_agent][info] Elastic Agent started
09:17:31.369 elastic_agent [elastic_agent][info] Starting upgrade watcher
09:17:31.403 elastic_agent [elastic_agent][info] Upgrade Watcher invoked
09:17:31.602 elastic_agent [elastic_agent][info] APM instrumentation disabled
09:17:31.656 elastic_agent [elastic_agent][info] Gathered system information
09:17:31.781 elastic_agent [elastic_agent][info] Upgrade Watcher started
09:17:31.821 elastic_agent [elastic_agent][info] update marker not present at 'C:\Program Files\Elastic\Agent\data'
09:17:31.848 elastic_agent [elastic_agent][info] Detected available inputs and outputs
09:17:31.848 elastic_agent [elastic_agent][info] Capabilities file not found in C:\Program Files\Elastic\Agent\capabilities.yml
09:17:31.848 elastic_agent [elastic_agent][info] Determined allowed capabilities
09:17:31.848 elastic_agent [elastic_agent][info] Loading baseline config from C:\Program Files\Elastic\Agent\elastic-agent.yml
09:17:32.004 elastic_agent [elastic_agent][info] GRPC comms socket listening at localhost:6789
09:17:32.023 elastic_agent [elastic_agent][info] Parsed configuration and determined agent is managed by Fleet
09:17:32.024 elastic_agent [elastic_agent][warn] SSL/TLS verifications disabled.
09:17:36.657 elastic_agent [elastic_agent][info] GRPC control socket listening at npipe:///elastic-agent-system
09:17:36.671 elastic_agent [elastic_agent][info] Docker provider skipped, unable to connect: protocol not available
09:17:36.678 elastic_agent [elastic_agent][info] Starting grpc control protocol listener on port 6789 with max_message_size 104857600
09:17:36.886 elastic_agent [elastic_agent][info] restoring current policy from disk
09:17:36.930 elastic_agent [elastic_agent][info] Setting fallback log level from policy
09:17:36.990 elastic_agent [elastic_agent][info] Fleet gateway started
09:17:37.012 elastic_agent [elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
09:17:37.013 elastic_agent [elastic_agent][info] Starting monitoring server with cfg &config.MonitoringConfig{Enabled:true, MonitorLogs:true, MonitorMetrics:false, MetricsPeriod:"", LogMetrics:true, HTTP:(*config.MonitoringHTTPConfig)(0xc0005ddcb0), Namespace:"default", Pprof:(*config.PprofConfig)(nil), MonitorTraces:false, APM:config.APMConfig{Environment:"", APIKey:"", SecretToken:"", Hosts:[]string(nil), GlobalLabels:map[string]string(nil), TLS:config.APMTLS{SkipVerify:false, ServerCertificate:"", ServerCA:""}, SamplingRate:(*float32)(nil)}, Diagnostics:config.Diagnostics{Uploader:config.Uploader{MaxRetries:10, InitDur:1000000000, MaxDur:600000000000}, Limit:config.Limit{Interval:60000000000, Burst:1}}}
09:17:37.013 elastic_agent [elastic_agent][info] creating monitoring API with cfg api.Config{Enabled:true, Host:"http://localhost:6791", Port:6791, User:"", SecurityDescriptor:"", Timeout:5000000000}
09:17:37.015 elastic_agent [elastic_agent][info] Starting stats endpoint
09:17:37.032 elastic_agent [elastic_agent][info] Metrics endpoint listening on: 127.0.0.1:6791 (configured: http://localhost:6791)
09:17:37.039 elastic_agent [elastic_agent][info] Updating running component model
09:17:37.039 elastic_agent [elastic_agent][warn] SSL/TLS verifications disabled.
09:17:37.354 elastic_agent [elastic_agent][info] Creating connection info server for endpoint service, address: npipe:///.eaci.sock
09:17:37.354 elastic_agent [elastic_agent][info] check if endpoint service is installed
09:17:37.364 elastic_agent endpoint-default [elastic_agent][info] Spawned new component endpoint-default: Starting: endpoint service runtime
09:17:37.389 elastic_agent endpoint-default [elastic_agent][info] Spawned new unit endpoint-default: Starting: endpoint service runtime
09:17:37.389 elastic_agent endpoint-default [elastic_agent][info] Spawned new unit endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae: Starting: endpoint service runtime
09:17:38.232 elastic_agent [elastic_agent][info] component model updated
09:17:38.232 elastic_agent [elastic_agent][info] Updating running component model
09:17:42.354 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: info: Main.cpp:569 Verifying existing installation
09:17:42.355 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: info: InstallLib.cpp:611 Running [C:\Program Files\Elastic\Endpoint\elastic-endpoint.exe] [version --log stdout]
09:17:42.355 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: debug: Service.cpp:804 PPL is supported. This process is unprotected. (TrustLevelSid: absent)
09:17:44.589 elastic_agent [elastic_agent][info] after check if endpoint service is installed, err:
09:17:44.659 elastic_agent winlog-default [elastic_agent][info] Spawned new component winlog-default: Starting: spawned pid '12520'
09:17:44.659 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default-winlog-system-85821b11-0064-11ee-b676-af36e033a9ae: Starting: spawned pid '12520'
09:17:44.660 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default-winlog-windows-85821b12-0064-11ee-b676-af36e033a9ae: Starting: spawned pid '12520'
09:17:44.660 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default: Starting: spawned pid '12520'
09:17:44.660 elastic_agent [elastic_agent][error] 2024-12-02 08:17:44:
09:17:48.372 elastic_agent [elastic_agent][info] control checkin v2 protocol has chunking enabled
09:17:48.372 elastic_agent [elastic_agent][info] control checkin v2 protocol has chunking enabled
09:17:48.374 elastic_agent winlog-default [elastic_agent][info] Component state changed winlog-default (STARTING->HEALTHY): Healthy: communicating with pid '12520'
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default-winlog-system-85821b11-0064-11ee-b676-af36e033a9ae (STARTING->HEALTHY): Healthy
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default-winlog-windows-85821b12-0064-11ee-b676-af36e033a9ae (STARTING->HEALTHY): Healthy
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default (STARTING->HEALTHY): Healthy
09:17:55.396 elastic_agent endpoint-default [elastic_agent][info] Component state changed endpoint-default (STARTING->HEALTHY): Healthy: communicating with endpoint service
09:18:02.119 elastic_agent [elastic_agent][info] component model updated
09:18:02.119 elastic_agent [elastic_agent][info] Updating running component model
09:18:15.393 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default (STARTING->CONFIGURING): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.393 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae (STARTING->CONFIGURING): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.875 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae (CONFIGURING->HEALTHY): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.875 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default (CONFIGURING->HEALTHY): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
Any additional context:
When ElasticAgent-Service is started again on next refresh the error-message below shows up in Kibana Fleet->Agents:

Kibana version:
8.16.1
Elasticsearch version:
8.16.1
Server OS version:
ELK/Fleet: Debian/Ubuntu
Agents: Windows Server 2019, 2022
Original install method (e.g. download page, yum, from source, etc.):
On-prem, Debian package
Describe the bug:
Since Version 8.16.0 Agents that have been offline for some time show up as "unenrolled" in Kibana (mouseover Information-Icon) and cannot be upgraded via fleet anymore.
Steps to reproduce:
Expected behavior:
Agent should still show up enrolled in Fleet and be upgradeable even if being offline for a few days.
How is it even possible that "healthy" agents be "unenrolled" at the same time?
Screenshots (if relevant):

Errors in browser console (if relevant):
Failed to load resource: the server responded with a status of 500 (Internal Server Error)
:5601/api/fleet/agents?page=1&perPage=20&kuery=status%3Aonline%20or%20(status%3Aerror%20or%20status%3Adegraded)%20or%20(status%3Aupdating%20or%20status%3Aunenrolling%20or%20status%3Aenrolling)%20or%20status%3Aoffline&sortField=enrolled_at&sortOrder=desc&showInactive=false&showUpgradeable=false&getStatusSummary=true&withMetrics=true:1
Provide logs and/or server output (if relevant):
elastic-agent version:
Binary: 8.16.1 (build: b6da7f8ebb1d0d06c1f1929dfed8458708a5bedf at 2024-11-19 02:02:29 +0000 UTC)
Daemon: 8.16.1 (build: b6da7f8ebb1d0d06c1f1929dfed8458708a5bedf at 2024-11-19 02:02:29 +0000 UTC)
Agent-Log (example host) from day of alleged "unenrollment" (shutdown caused by a server reboot):
Nov 30, 2024
20:00:49.673 elastic_agent [elastic_agent][info] signal "terminated" received
20:00:49.673 elastic_agent [elastic_agent][info] Shutting down Elastic Agent and sending last events...
20:00:49.680 elastic_agent [elastic_agent][warn] Possible transient error during checkin with fleet-server, retrying
20:00:49.687 elastic_agent [elastic_agent][error] failed accept conn info connection: use of closed network connection
20:00:49.687 elastic_agent [elastic_agent][info] stopping endpoint service runtime
20:00:49.881 elastic_agent [elastic_agent][info] Shutting down completed.
20:00:49.881 elastic_agent [elastic_agent][info] Stopping monitoring server
20:00:49.882 elastic_agent [elastic_agent][info] Stats endpoint (127.0.0.1:6791) finished: accept tcp 127.0.0.1:6791: use of closed network connection
Agent-Log (example host) from day of turning ElasticAgent-Service manually back on:
Dec 2, 2024
09:17:31.073 elastic_agent [elastic_agent][info] Elastic Agent started
09:17:31.369 elastic_agent [elastic_agent][info] Starting upgrade watcher
09:17:31.403 elastic_agent [elastic_agent][info] Upgrade Watcher invoked
09:17:31.602 elastic_agent [elastic_agent][info] APM instrumentation disabled
09:17:31.656 elastic_agent [elastic_agent][info] Gathered system information
09:17:31.781 elastic_agent [elastic_agent][info] Upgrade Watcher started
09:17:31.821 elastic_agent [elastic_agent][info] update marker not present at 'C:\Program Files\Elastic\Agent\data'
09:17:31.848 elastic_agent [elastic_agent][info] Detected available inputs and outputs
09:17:31.848 elastic_agent [elastic_agent][info] Capabilities file not found in C:\Program Files\Elastic\Agent\capabilities.yml
09:17:31.848 elastic_agent [elastic_agent][info] Determined allowed capabilities
09:17:31.848 elastic_agent [elastic_agent][info] Loading baseline config from C:\Program Files\Elastic\Agent\elastic-agent.yml
09:17:32.004 elastic_agent [elastic_agent][info] GRPC comms socket listening at localhost:6789
09:17:32.023 elastic_agent [elastic_agent][info] Parsed configuration and determined agent is managed by Fleet
09:17:32.024 elastic_agent [elastic_agent][warn] SSL/TLS verifications disabled.
09:17:36.657 elastic_agent [elastic_agent][info] GRPC control socket listening at npipe:///elastic-agent-system
09:17:36.671 elastic_agent [elastic_agent][info] Docker provider skipped, unable to connect: protocol not available
09:17:36.678 elastic_agent [elastic_agent][info] Starting grpc control protocol listener on port 6789 with max_message_size 104857600
09:17:36.886 elastic_agent [elastic_agent][info] restoring current policy from disk
09:17:36.930 elastic_agent [elastic_agent][info] Setting fallback log level from policy
09:17:36.990 elastic_agent [elastic_agent][info] Fleet gateway started
09:17:37.012 elastic_agent [elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
09:17:37.013 elastic_agent [elastic_agent][info] Starting monitoring server with cfg &config.MonitoringConfig{Enabled:true, MonitorLogs:true, MonitorMetrics:false, MetricsPeriod:"", LogMetrics:true, HTTP:(*config.MonitoringHTTPConfig)(0xc0005ddcb0), Namespace:"default", Pprof:(*config.PprofConfig)(nil), MonitorTraces:false, APM:config.APMConfig{Environment:"", APIKey:"", SecretToken:"", Hosts:[]string(nil), GlobalLabels:map[string]string(nil), TLS:config.APMTLS{SkipVerify:false, ServerCertificate:"", ServerCA:""}, SamplingRate:(*float32)(nil)}, Diagnostics:config.Diagnostics{Uploader:config.Uploader{MaxRetries:10, InitDur:1000000000, MaxDur:600000000000}, Limit:config.Limit{Interval:60000000000, Burst:1}}}
09:17:37.013 elastic_agent [elastic_agent][info] creating monitoring API with cfg api.Config{Enabled:true, Host:"http://localhost:6791", Port:6791, User:"", SecurityDescriptor:"", Timeout:5000000000}
09:17:37.015 elastic_agent [elastic_agent][info] Starting stats endpoint
09:17:37.032 elastic_agent [elastic_agent][info] Metrics endpoint listening on: 127.0.0.1:6791 (configured: http://localhost:6791)
09:17:37.039 elastic_agent [elastic_agent][info] Updating running component model
09:17:37.039 elastic_agent [elastic_agent][warn] SSL/TLS verifications disabled.
09:17:37.354 elastic_agent [elastic_agent][info] Creating connection info server for endpoint service, address: npipe:///.eaci.sock
09:17:37.354 elastic_agent [elastic_agent][info] check if endpoint service is installed
09:17:37.364 elastic_agent endpoint-default [elastic_agent][info] Spawned new component endpoint-default: Starting: endpoint service runtime
09:17:37.389 elastic_agent endpoint-default [elastic_agent][info] Spawned new unit endpoint-default: Starting: endpoint service runtime
09:17:37.389 elastic_agent endpoint-default [elastic_agent][info] Spawned new unit endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae: Starting: endpoint service runtime
09:17:38.232 elastic_agent [elastic_agent][info] component model updated
09:17:38.232 elastic_agent [elastic_agent][info] Updating running component model
09:17:42.354 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: info: Main.cpp:569 Verifying existing installation
09:17:42.355 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: info: InstallLib.cpp:611 Running [C:\Program Files\Elastic\Endpoint\elastic-endpoint.exe] [version --log stdout]
09:17:42.355 elastic_agent [elastic_agent][error] 2024-12-02 08:17:42: debug: Service.cpp:804 PPL is supported. This process is unprotected. (TrustLevelSid: absent)
09:17:44.589 elastic_agent [elastic_agent][info] after check if endpoint service is installed, err:
09:17:44.659 elastic_agent winlog-default [elastic_agent][info] Spawned new component winlog-default: Starting: spawned pid '12520'
09:17:44.659 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default-winlog-system-85821b11-0064-11ee-b676-af36e033a9ae: Starting: spawned pid '12520'
09:17:44.660 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default-winlog-windows-85821b12-0064-11ee-b676-af36e033a9ae: Starting: spawned pid '12520'
09:17:44.660 elastic_agent winlog-default [elastic_agent][info] Spawned new unit winlog-default: Starting: spawned pid '12520'
09:17:44.660 elastic_agent [elastic_agent][error] 2024-12-02 08:17:44:
09:17:48.372 elastic_agent [elastic_agent][info] control checkin v2 protocol has chunking enabled
09:17:48.372 elastic_agent [elastic_agent][info] control checkin v2 protocol has chunking enabled
09:17:48.374 elastic_agent winlog-default [elastic_agent][info] Component state changed winlog-default (STARTING->HEALTHY): Healthy: communicating with pid '12520'
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default-winlog-system-85821b11-0064-11ee-b676-af36e033a9ae (STARTING->HEALTHY): Healthy
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default-winlog-windows-85821b12-0064-11ee-b676-af36e033a9ae (STARTING->HEALTHY): Healthy
09:17:49.402 elastic_agent winlog-default [elastic_agent][info] Unit state changed winlog-default (STARTING->HEALTHY): Healthy
09:17:55.396 elastic_agent endpoint-default [elastic_agent][info] Component state changed endpoint-default (STARTING->HEALTHY): Healthy: communicating with endpoint service
09:18:02.119 elastic_agent [elastic_agent][info] component model updated
09:18:02.119 elastic_agent [elastic_agent][info] Updating running component model
09:18:15.393 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default (STARTING->CONFIGURING): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.393 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae (STARTING->CONFIGURING): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.875 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default-85821b10-0064-11ee-b676-af36e033a9ae (CONFIGURING->HEALTHY): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
09:18:15.875 elastic_agent endpoint-default [elastic_agent][info] Unit state changed endpoint-default (CONFIGURING->HEALTHY): Applied policy {85821b10-0064-11ee-b676-af36e033a9ae}
Any additional context:

When ElasticAgent-Service is started again on next refresh the error-message below shows up in Kibana Fleet->Agents: