Skip to content

Update elastic-agent-client for elastic-agent in 7.x#39800

Merged
michel-laterman merged 7 commits intoelastic:7.17from
michel-laterman:7.17-client
Jun 20, 2024
Merged

Update elastic-agent-client for elastic-agent in 7.x#39800
michel-laterman merged 7 commits intoelastic:7.17from
michel-laterman:7.17-client

Conversation

@michel-laterman
Copy link
Copy Markdown
Contributor

@michel-laterman michel-laterman commented Jun 4, 2024

Proposed commit message

Cherry-pick #39586 which was the PR to update elastic-agent-libs and rename the control proto internally to avoid a namespace collision. The PR was reverted because the agent did not start when deployed in cloud.

Added mage cloud:* targets to see if the issue was due to the agent running in under a container, or if it was how our containers are provisioned and found it's due to the container mode start.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Build a local image with mage cloud:image
Start a 7.17.22-SNAPSHOT cloud deployment
Create a policy with only the fleet-server integration
Start a container with

docker run \
  --env FLEET_SERVER_ENABLE=true \
  --env FLEET_SERVER_ELASTICSEARCH_HOST=$ES_HOST \
  --env FLEET_SERVER_SERVICE_TOKEN=$SERVICE_TOKEN \
  --env FLEET_SERVER_POLICY_ID=$POLICY_ID \
  --env FLEET_SERVER_PORT=8220 -p 8220:8220 \
  --rm docker.elastic.co/beats/elastic-agent-complete:7.17.22-SNAPSHOT elastic-agent container

Related issues

Fixes elastic/fleet-server#3592

michel-laterman and others added 2 commits June 4, 2024 10:28
elastic#39586)

Update elastic-agent-client to a tagged release (v7.8.1), and rename control proto package to cproto so it does not conflict with elastic-agent-client import
@michel-laterman michel-laterman added enhancement Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jun 4, 2024
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 4, 2024
@michel-laterman
Copy link
Copy Markdown
Contributor Author

michel-laterman commented Jun 4, 2024

With debug enabled the enrollement output (from container mode) is:

Policy selected for enrollment:  991ef9e0-22af-11ef-8aee-fde4163b8652
2024-06-04T22:21:19.524Z	DEBUG	cmd/enroll_cmd.go:285	verifying communication with running Elastic Agent daemon
2024-06-04T22:21:19.525Z	INFO	cmd/enroll_cmd.go:386	Generating self-signed certificate for Fleet Server
2024-06-04T22:21:25.977Z	INFO	cmd/enroll_cmd.go:571	Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2024-06-04T22:21:26.133Z	ERROR	cmd/run.go:122	failed to invoke rollback watcher: fork/exec /usr/share/elastic-agent/state/data/data/elastic-agent-361b3a/elastic-agent: no such file or directory
2024-06-04T22:21:26.133Z	INFO	cmd/run.go:126	Artifact has been built with security disabled. Elastic Agent will not verify signatures of the artifacts.
2024-06-04T22:21:26.136Z	INFO	application/application.go:67	Detecting execution mode
2024-06-04T22:21:26.136Z	INFO	application/application.go:76	Agent is managed locally
2024-06-04T22:21:26.137Z	INFO	capabilities/capabilities.go:59	capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
2024-06-04T22:21:26.344Z	DEBUG	emitter/emitter.go:24	Supported programs: Fleet Server, Heartbeat, Metricbeat, Osquerybeat, Packetbeat, APM-Server, Endpoint Security, Filebeat
2024-06-04T22:21:26.344Z	DEBUG	[composable]	kubernetessecrets/kubernetes_secrets.go:88	Kubernetes_secrets provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-04T22:21:26.344Z	DEBUG	[composable]	kubernetesleaderelection/kubernetes_leaderelection.go:55	Kubernetes leaderelection provider skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-04T22:21:26.446Z	DEBUG	[composable.providers.kubernetes]	kubernetes/kubernetes.go:84	Kubernetes provider for resource pod skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-04T22:21:26.447Z	DEBUG	[composable.providers.kubernetes]	kubernetes/kubernetes.go:84	Kubernetes provider for resource node skipped, unable to connect: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
2024-06-04T22:21:26.447Z	DEBUG	[docker]	docker/client.go:49	Docker client will negotiate the API version on the first request.
2024-06-04T22:21:26.447Z	INFO	[composable.providers.docker]	docker/docker.go:43	Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2024-06-04T22:21:26.447Z	DEBUG	application/local_mode.go:148	Reloading of configuration is on, frequency is set to 10s
2024-06-04T22:21:26.447Z	INFO	[api]	api/server.go:62	Starting stats endpoint
2024-06-04T22:21:26.448Z	INFO	application/local_mode.go:176	Agent is starting
2024-06-04T22:21:26.448Z	INFO	[api]	api/server.go:64	Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2024-06-04T22:21:26.448Z	INFO	application/local_mode.go:186	Agent is stopped
2024-06-04T22:21:26.448Z	DEBUG	application/periodic.go:63	Adding 1 file to watch
2024-06-04T22:21:26.449Z	INFO	application/periodic.go:79	Configuration changes detected
2024-06-04T22:21:26.449Z	DEBUG	application/periodic.go:85	Updated 1 files: /usr/share/elastic-agent/state/elastic-agent.yml
2024-06-04T22:21:26.451Z	DEBUG	status/reporter.go:200	'capabilities-55d8119c' has status 'online'
2024-06-04T22:21:26.451Z	DEBUG	emitter/controller.go:155	Converting single configuration into specific programs configuration
2024-06-04T22:21:26.451Z	DEBUG	application/periodic.go:29	Failed to read configuration, error: could not emit configuration: fail to extract program configuration: invalid configuration missing outputs configuration
2024-06-04T22:21:26.548Z	DEBUG	emitter/controller.go:155	Converting single configuration into specific programs configuration
2024-06-04T22:21:26.548Z	ERROR	emitter/controller.go:126	Failed to render configuration with latest context from composable controller: fail to extract program configuration: invalid configuration missing outputs configuration

This is an issue with parsing the elastic-agent.yml file on startup
EDIT
I tried to add a bit of context locally to the errror:

Failed to read configuration, error: could not emit configuration: [/usr/share/elastic-agent/state/elastic-agent.yml]: programs error, AST: map[fleet:map[enabled:true] path:map[config:/usr/share/elastic-agent/state data:/usr/share/elastic-agent/state/data home:/usr/share/elastic-agent/state/data logs:/usr/share/elastic-agent] runtime:map[arch:amd64 os:linux osinfo:map[family:debian major:20 minor:4 patch:6 type:linux version:20.04.6 LTS (Focal Fossa)]]]: fail to extract program configuration: invalid configuration missing outputs configuration

The error above is emitted from:

programsToRun, err := program.Programs(e.agentInfo, ast)
if err != nil {
return err
}

but the ast has been conveted into a map with a call to ast.Map

EDIT 2
It looks like with the update, container mode does not properly detect it should start the fleet-server bootstrap process. Here's the start of the log output when a working 7.17 container starts with the same (docker) env vars:

Policy selected for enrollment:  573d02e0-2365-11ef-b465-19c5d0fc1685
2024-06-05T19:15:43.586Z	DEBUG	cmd/enroll_cmd.go:285	verifying communication with running Elastic Agent daemon
2024-06-05T19:15:43.587Z	INFO	cmd/enroll_cmd.go:386	Generating self-signed certificate for Fleet Server
2024-06-05T19:15:44.209Z	INFO	cmd/enroll_cmd.go:571	Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2024-06-05T19:15:44.367Z	ERROR	cmd/run.go:122	failed to invoke rollback watcher: fork/exec /usr/share/elastic-agent/state/data/data/elastic-agent-51edb8/elastic-agent: no such file or directory
2024-06-05T19:15:44.367Z	INFO	cmd/run.go:126	Artifact has been built with security disabled. Elastic Agent will not verify signatures of the artifacts.
2024-06-05T19:15:44.369Z	INFO	application/application.go:67	Detecting execution mode
2024-06-05T19:15:44.371Z	INFO	application/application.go:88	Agent is in Fleet Server bootstrap mode
2024-06-05T19:15:44.470Z	INFO	[api]	api/server.go:62	Starting stats endpoint
2024-06-05T19:15:44.471Z	INFO	application/fleet_server_bootstrap.go:130	Agent is starting
2024-06-05T19:15:44.471Z	INFO	[api]	api/server.go:64	Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
2024-06-05T19:15:44.472Z	INFO	application/fleet_server_bootstrap.go:140	Agent is stopped
2024-06-05T19:15:44.476Z	DEBUG	router/router.go:83	Creating stream: default
2024-06-05T19:15:45.219Z	DEBUG	cmd/enroll_cmd.go:739	Waiting for Elastic Agent to start Fleet Server: no fleet-server application running
2024-06-05T19:15:45.219Z	INFO	cmd/enroll_cmd.go:743	Waiting for Elastic Agent to start Fleet Server
2024-06-05T19:15:45.550Z	DEBUG	router/router.go:98	Streams default need to run config with ID PIgyp30D and programs: Fleet Server

@michel-laterman
Copy link
Copy Markdown
Contributor Author

I've added more logging around the container and enroll_cmd entrypoints as well as application/application.go
In the container command (for both a successful 7.17 image and an image made with the changes in this PR) we can see
/usr/share/elastic-agent/state/fleet.yml does not exist on startup or when the container command calls enroll.Start

When the enroll executes we can see (for both images) that the config it has in memory indicates fleet is enabled.

2024-06-07T19:27:30.551Z	INFO	cmd/enroll_cmd.go:330	enroll bootstrap
agent-config: map[id: monitoring.http:0xc0003540e0]
fleet: &{Enabled:true AccessAPIKey: Client:{Protocol:http SpaceID: Username: Password: Path: Host:localhost:5601 Hosts:[] Transport:{TLS:<nil> Timeout:10m0s Proxy:{URL:<nil> Headers:map[] Disable:true}}} Reporting:0xc0005d0ba0 Info:0xc000288210 Server:0xc0004a0820}

However in application/application.go (during application creation)we can see that the config loaded by the different images is different. For the image based on this PR we can see that nothing is loaded in via the rawConfig (so we get default values),

2024-06-07T21:26:44.262Z	INFO	application/application.go:51	Application loading config: /usr/share/elastic-agent/state/fleet.yml
mapErr=<nil>
contents=map[agent:map[headers:<nil> id:83beaf1b-d1fc-4356-bd85-24b51e564db2 logging:map[level:debug] monitoring:map[http:map[enabled:false host: port:6791]]]]

However for a working 7.17 image it loads from a file:

2024-06-07T19:27:30.817Z	INFO	application/application.go:51	Application loading config: /usr/share/elastic-agent/state/fleet.yml
mapErr=<nil>
contents=map[agent:map[id: monitoring:map[http:map[enabled:false host: port:6791]]] fleet:map[access_api_key: agent:map[id:] enabled:true host:localhost:5601 protocol:http proxy_disable:true reporting:map[check_frequency_sec:30 threshold:10000] server:map[bootstrap:true 
...

backExp.Wait()
err = storeAgentInfo(s, reader)
if err != filelock.ErrAppAlreadyRunning {
if !stderror.Is(err, filelock.ErrAppAlreadyRunning) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally forgot to invert this check which was causing the elastic-agent container bootstrap to fail.
It now works

@michel-laterman michel-laterman marked this pull request as ready for review June 17, 2024 18:02
@michel-laterman michel-laterman requested review from a team as code owners June 17, 2024 18:02
@michel-laterman michel-laterman requested review from leehinman and pchila and removed request for a team June 17, 2024 18:02
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@michel-laterman michel-laterman requested review from fearful-symmetry and removed request for a team June 17, 2024 18:02
Copy link
Copy Markdown
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major blockers, added a few suggestions about handling errors instead of adding nolint directives even in defer statements and changing alias for/removing our errors package

func (c *enrollCmd) stopAgent() {
if c.agentProc != nil {
c.agentProc.StopWait()
c.agentProc.StopWait() //nolint:errcheck // no error check here
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check the error and log it at least instead of adding the nolint directive ?

import (
"bytes"
"context"
stderror "errors"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Naming] stderror gave me os.Stderr vibes when I was reading the code... can we use a different alias like goerrors or stdliberrors maybe ?

Even better: we could get rid of the "github.com/elastic/beats/v7/x-pack/elastic-agent/pkg/agent/errors" import and only use standard library errors package

return err
}
defer fileLock.Unlock()
defer fileLock.Unlock() //nolint:errcheck // defered call
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can handle the error here, for example with a closure:

Suggested change
defer fileLock.Unlock() //nolint:errcheck // defered call
defer func(fileLock *filelock.AppLocker) {
unlockErr := fileLock.Unlock()
if unlockErr != nil {
// log error here or join it with the internal error returned
}
}(fileLock)

}

res, err := client.Get("http://" + endpoint + "/")
req, err := http.NewRequestWithContext(ctx, "GET", "http://"+endpoint+"/", nil)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use http module constants here

Suggested change
req, err := http.NewRequestWithContext(ctx, "GET", "http://"+endpoint+"/", nil)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://"+endpoint+"/", nil)

if ok := certPool.AppendCertsFromPEM(s.ca.Crt()); !ok {
return errors.New("failed to append root CA", errors.TypeSecurity)
}
//nolint:gosec // G402: TLS MinVersion too low.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it breaks backward compatibility but it would be nice if we stopped accepting insecure TLS versions... Although this also silences the linter.

@cmacknz maybe we want to address this in a separate issue?

@michel-laterman michel-laterman enabled auto-merge (squash) June 20, 2024 16:10
@michel-laterman michel-laterman merged commit 7b05df8 into elastic:7.17 Jun 20, 2024
@michel-laterman michel-laterman deleted the 7.17-client branch June 20, 2024 19:51
@pchila pchila mentioned this pull request Jun 26, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Deployment]: Hosted fleet server is not available under Agents tab for 7.17.22

3 participants