Skip to content

The elastic-agent upgrade command can fail even though the upgrade succeeds #3890

@cmacknz

Description

@cmacknz

First observed in https://buildkite.com/elastic/elastic-agent/builds/5460#018c4990-62ec-4dc1-a9c0-81953d923d80. One of the package version tests failed because the elastic-agent ugprade command exited with an EOF error.

=== RUN   TestUpgradeBrokenPackageVersion
    upgrade_broken_package_test.go:59: Testing Elastic Agent upgrade from 8.12.0-SNAPSHOT to 8.11.2-SNAPSHOT...
    fetcher.go:91: Using existing artifact elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz
    fixture.go:261: Extracting artifact elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz to /tmp/TestUpgradeBrokenPackageVersion2415228739/001
    fixture.go:274: Completed extraction of artifact elastic-agent-8.12.0-SNAPSHOT-linux-arm64.tar.gz to /tmp/TestUpgradeBrokenPackageVersion2415228739/001
    fixture.go:717: Components were not modified from the fetched artifact
    fetcher.go:91: Using existing artifact elastic-agent-8.11.2-SNAPSHOT-linux-arm64.tar.gz
    fixture.go:261: Extracting artifact elastic-agent-8.11.2-SNAPSHOT-linux-arm64.tar.gz to /tmp/TestUpgradeBrokenPackageVersion2415228739/002
    fixture.go:274: Completed extraction of artifact elastic-agent-8.11.2-SNAPSHOT-linux-arm64.tar.gz to /tmp/TestUpgradeBrokenPackageVersion2415228739/002
    fixture.go:717: Components were not modified from the fetched artifact
    fixture.go:520: >> running binary with: [/tmp/TestUpgradeBrokenPackageVersion2415228739/001/elastic-agent-8.12.0-SNAPSHOT-linux-arm64/elastic-agent version --binary-only --yaml]
    fixture.go:520: >> running binary with: [/tmp/TestUpgradeBrokenPackageVersion2415228739/002/elastic-agent-8.11.2-SNAPSHOT-linux-arm64/elastic-agent version --binary-only --yaml]
    upgrader.go:167: Installing version "8.12.0-SNAPSHOT"
    fixture_install.go:99: [test TestUpgradeBrokenPackageVersion] Inside fixture install function
    fixture.go:520: >> running binary with: [/tmp/TestUpgradeBrokenPackageVersion2415228739/001/elastic-agent-8.12.0-SNAPSHOT-linux-arm64/elastic-agent install --force --non-interactive]
    fixture.go:520: >> running binary with: [/opt/Elastic/Agent/elastic-agent status --output json]
    upgrade_broken_package_test.go:83: package version files found: [data/elastic-agent-e3c069/package.version]
    upgrade_broken_package_test.go:88: removing package version file "/opt/Elastic/Agent/data/elastic-agent-e3c069/package.version"
    upgrader.go:202: Upgrading from version "8.12.0-SNAPSHOT" to version "8.11.2-SNAPSHOT"
    fixture.go:520: >> running binary with: [/opt/Elastic/Agent/elastic-agent upgrade 8.11.2-SNAPSHOT --source-uri file:///home/ubuntu/agent/.agent-testing/artifact --skip-verify]
    upgrade_broken_package_test.go:62: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/upgrade_broken_package_test.go:62
        	Error:      	Received unexpected error:
        	            	failed to start agent upgrade to version "8.11.2": exit status 1
        	            	Error initializing version information: reading package version from file "/opt/Elastic/Agent/data/elastic-agent-e3c069/package.version": open /opt/Elastic/Agent/data/elastic-agent-e3c069/package.version: no such file or directory
        	            	Error: Failed trigger upgrade of daemon: rpc error: code = Unavailable desc = error reading from server: EOF
        	            	For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.13/fleet-troubleshooting.html
        	Test:       	TestUpgradeBrokenPackageVersion
    fixture_install.go:127: [test TestUpgradeBrokenPackageVersion] Inside fixture cleanup function
    fixture_install.go:142: collecting diagnostics; test failed
    fixture.go:520: >> running binary with: [/opt/Elastic/Agent/elastic-agent diagnostics -f /home/ubuntu/agent/build/diagnostics/TestUpgradeBrokenPackageVersion-diagnostics-2023-12-08T13:47:58Z.zip]
    fixture_install.go:268: failed to collect diagnostics to /home/ubuntu/agent/build/diagnostics/TestUpgradeBrokenPackageVersion-diagnostics-2023-12-08T13:47:58Z.zip (exit status 1): [WARNING]: failed to fetch agent diagnostics: error in DiagnosticAgent RPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/elastic-agent.sock: connect: connection refused"[WARNING]: failed to fetch unit diagnostics: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/elastic-agent.sock: connect: connection refused"[WARNING]: failed to fetch component diagnostics: error in DiagnosticComponents RPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/elastic-agent.sock: connect: connection refused"Error: failed collecting diagnostics: no diags could be fetched
        For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html
    fixture_install.go:274: retrying in 15 seconds due to connection error; possible Elastic Agent was not fully started
    fixture.go:520: >> running binary with: [/opt/Elastic/Agent/elastic-agent diagnostics -f /home/ubuntu/agent/build/diagnostics/TestUpgradeBrokenPackageVersion-diagnostics-2023-12-08T13:47:58Z.zip]
    fixture_install.go:277: failed to collect diagnostics a second time at /home/ubuntu/agent/build/diagnostics/TestUpgradeBrokenPackageVersion-diagnostics-2023-12-08T13:47:58Z.zip (%!s()): Created diagnostics archive "/home/ubuntu/agent/build/diagnostics/TestUpgradeBrokenPackageVersion-diagnostics-2023-12-08T13:47:58Z.zip"
        ***** WARNING *****
        Created archive may contain plain text credentials.
        Ensure that files in archive are redacted before sharing.
        *******************
    fixture.go:520: >> running binary with: [/opt/Elastic/Agent/elastic-agent uninstall --force]
--- FAIL: TestUpgradeBrokenPackageVersion (100.75s)

However the upgrade had actually started looking at the agent logs:

{"log.level":"info","@timestamp":"2023-12-08T13:47:25.250Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_REQUESTED","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:25.250Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":136},"message":"Upgrading agent","log":{"source":"elastic-agent"},"version":"8.11.2-SNAPSHOT","source_uri":"file:///home/ubuntu/agent/.agent-testing/artifact","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:25.250Z","log.origin":{"file.name":"upgrade/cleanup.go","file.line":22},"message":"Cleaning up non-matching downloaded versions","log":{"source":"elastic-agent"},"version":"8.12.0","downloads.path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/downloads","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:25.250Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_DOWNLOADING","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:25.250Z","log.origin":{"file.name":"upgrade/step_download.go","file.line":85},"message":"Using local upgrade artifact","log":{"source":"elastic-agent"},"version":"8.11.2-SNAPSHOT","drop_path":"/home/ubuntu/agent/.agent-testing/artifact","target_path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/downloads","install_path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/install","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:25.697Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_EXTRACTING","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/step_unpack.go","file.line":42},"message":"Unpacked upgrade artifact","log":{"source":"elastic-agent"},"version":"8.11.2-SNAPSHOT","file.path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/downloads/elastic-agent-8.11.2-SNAPSHOT-linux-arm64.tar.gz","hash":"1c21b0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":280},"message":"Copying action store","log":{"source":"elastic-agent"},"new_home_path":"/opt/Elastic/Agent/data/elastic-agent-1c21b0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":284},"message":"Copying action store path","log":{"source":"elastic-agent"},"from":"/opt/Elastic/Agent/data/elastic-agent-e3c069/action_store.yml","to":"/opt/Elastic/Agent/data/elastic-agent-1c21b0/action_store.yml","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":284},"message":"Copying action store path","log":{"source":"elastic-agent"},"from":"/opt/Elastic/Agent/data/elastic-agent-e3c069/state.yml","to":"/opt/Elastic/Agent/data/elastic-agent-1c21b0/state.yml","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":284},"message":"Copying action store path","log":{"source":"elastic-agent"},"from":"/opt/Elastic/Agent/data/elastic-agent-e3c069/state.enc","to":"/opt/Elastic/Agent/data/elastic-agent-1c21b0/state.enc","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.888Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":306},"message":"Copying run directory","log":{"source":"elastic-agent"},"new_run_path":"/opt/Elastic/Agent/data/elastic-agent-1c21b0/run","old_run_path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/run","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.891Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":402},"message":"ignoring error: failed to copy \"/opt/Elastic/Agent/data/elastic-agent-e3c069/run\" to \"/opt/Elastic/Agent/data/elastic-agent-1c21b0/run\": lstat /opt/Elastic/Agent/data/elastic-agent-e3c069/run: no such file or directory","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.891Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_REPLACING","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.891Z","log.origin":{"file.name":"upgrade/step_relink.go","file.line":42},"message":"Changing symlink","log":{"source":"elastic-agent"},"symlink_path":"/opt/Elastic/Agent/elastic-agent","new_path":"/opt/Elastic/Agent/data/elastic-agent-1c21b0/elastic-agent","prev_path":"/opt/Elastic/Agent/elastic-agent.prev","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.893Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_WATCHING","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.893Z","log.origin":{"file.name":"upgrade/step_mark.go","file.line":131},"message":"Writing upgrade marker file","log":{"source":"elastic-agent"},"file.path":"/opt/Elastic/Agent/data/.update-marker","hash":"1c21b0","prev_hash":"e3c069","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.893Z","log.origin":{"file.name":"upgrade/step_mark.go","file.line":146},"message":"Updating active commit","log":{"source":"elastic-agent"},"file.path":"/opt/Elastic/Agent/.elastic-agent.active.commit","hash":"1c21b0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.894Z","log.origin":{"file.name":"upgrade/rollback.go","file.line":126},"message":"Starting upgrade watcher","log":{"source":"elastic-agent"},"path":"/opt/Elastic/Agent/elastic-agent","args":["/opt/Elastic/Agent/elastic-agent","watch","--path.config","/opt/Elastic/Agent","--path.home","/opt/Elastic/Agent"],"env":[],"dir":"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.894Z","log.origin":{"file.name":"upgrade/rollback.go","file.line":133},"message":"Upgrade Watcher invoked","log":{"source":"elastic-agent"},"agent.upgrade.watcher.process.pid":24106,"agent.process.pid":24072,"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.894Z","log.origin":{"file.name":"upgrade/rollback.go","file.line":121},"message":"releasing watcher 24106","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.894Z","log.origin":{"file.name":"upgrade/upgrade.go","file.line":217},"message":"Removing downloads directory","log":{"source":"elastic-agent"},"file.path":"/opt/Elastic/Agent/data/elastic-agent-e3c069/downloads","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.913Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":500},"message":"updated upgrade details","log":{"source":"elastic-agent"},"upgrade_details":{"target_version":"8.11.2-SNAPSHOT","state":"UPG_RESTARTING","metadata":{}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-12-08T13:47:58.914Z","log.origin":{"file.name":"cmd/run.go","file.line":303},"message":"reexec shutdown channel triggered","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}

If we look at the log timestamps in Buildkite for when the failure occurred:

2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): upgrade_broken_package_test.go:62:
-- | --
  | 2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): Error Trace:	/home/ubuntu/agent/testing/integration/upgrade_broken_package_test.go:62
  | 2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): Error:      	Received unexpected error:
  | 2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): failed to start agent upgrade to version "8.11.2": exit status 1
  | 2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): Error initializing version information: reading package version from file "/opt/Elastic/Agent/data/elastic-agent-e3c069/package.version": open /opt/Elastic/Agent/data/elastic-agent-e3c069/package.version: no such file or directory
  | 2023-12-08 08:58:19 EST | >>> (linux-arm64-ubuntu-2204-upgrade) Test output (sudo) (stdout): Error: Failed trigger upgrade of daemon: rpc error: code = Unavailable desc = error reading from server: EOF

We see the agent reexec-ed at 2023-12-08T13:47:58.914 and the elastic-agent upgrade error occurred later at 2023-12-08 08:58:19 (the timezones don't match the the minutes and second are accurate).

It seems as if the agent re-execed before responding to the elastic-agent upgrade command. We can either fix that or make elastic-agent upgrade asynchronous and expect users and our test to watch upgrade progress through the elastic-agent status command.

Metadata

Metadata

Assignees

Labels

Team:Elastic-AgentLabel for the Agent teambugSomething isn't workingflaky-testUnstable or unreliable test cases.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions