Skip to content

v2 attempt#364

Closed
eyalkraft wants to merge 3 commits intoelastic:mainfrom
eyalkraft:agent-v2
Closed

v2 attempt#364
eyalkraft wants to merge 3 commits intoelastic:mainfrom
eyalkraft:agent-v2

Conversation

@eyalkraft
Copy link
Copy Markdown
Contributor

@eyalkraft eyalkraft commented Aug 29, 2022

In order to check if v2 works quickly I did a short POC.

My changes only include

After having a local cloudbeat & elastic-agent on the required branched,
to build & load the image to kind run

BEAT_VERSION=8.5.0 make PackageAgent
docker tag docker.elastic.co/beats/elastic-agent-complete:8.5.0-SNAPSHOT docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT
kind load docker-image docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT

to set up everything:

eval "$(elastic-package stack shellinit)"
elastic-package stack up --version=8.5.0-SNAPSHOT -d -v

and from integrations/packages/cloud_security_posture run

elastic-package service up

Current status:
The built agent daemonset is in a CrashLoopBackOff:

Error: syncing download directory to STATE_PATH(/usr/share/elastic-agent/state) failed: lstat /usr/share/elastic-agent/data/elastic-agent-112580/downloads: no such file or directory
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.5/fleet-troubleshooting.html
Stream closed EOF for kube-system/elastic-agent-m5759 (elastic-agent)

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 29, 2022

This pull request does not have a backport label. Could you fix it @eyalkraft? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit
    NOTE: backport-skip has been added to this pull request.

@github-actions
Copy link
Copy Markdown

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Aug 30, 2022

I'm just getting setup to try to reproduce the problem you ran into. My first attempt to build with BEAT_VERSION=8.5.0 make PackageAgent on my M1 Mac failed. Edit: I think I tracked this down to something specific to being on a Mac in the agent magefile.

One thing I noticed that is missing from this PR is handling the configuration transformations that are in the V1 Cloudbeat spec file. Specifically the inject_index and index_stream_processor transformations that set the target datastream index and add an add_fields processor to attach the data stream name fields to each event.

In https://github.com/elastic/beats/pull/32673/files you can see an example of how Filebeat does this:

  1. In x-pack/filebeat/cmd/root.go there is a call to register a configuration transformation: management.ConfigTransform.SetTransform(filebeatCfg).
  2. The registered transformations are defined in x-pack/filebeat/cmd/agent.go
  3. The Go code to add the index and processors is defined here: https://github.com/fearful-symmetry/beats/blob/a176bd95d8841e8d766b11f36da0ef4f9dcfbabc/x-pack/libbeat/management/generate.go#L63

@eyalkraft
Copy link
Copy Markdown
Contributor Author

@cmacknz Thanks for trying this!

I knew there are the configuration transformations to handle (and we'll definitely have to add them later) but since the agent just crashed before even attempting to start cloudbeat I figured there's some problem in areas not related to cloudbeat.

@fearful-symmetry
Copy link
Copy Markdown

So, currently trying to build/reproduce this, but the error makes me think the problem is part of a set of issues that are fixed here: elastic/elastic-agent#1061

This is before the beater interface gets started, so it's probably not an issue with the V2 client?

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 1, 2022

@eyalkraft two of us have tried build this on an M1 Mac and a Linux machine and the build commands seem to fail consistently building the docker containers with an error like:

chmod: cannot access '/usr/share/elastic-agent/data/elastic-agent-*/components/*beat': No such file or directory
chmod: cannot access '/usr/share/elastic-agent/data/elastic-agent-*/components/*beat': No such file or directory

That path is part of the agent Dockerfile template. We are wondering if there is some additional setup to build Cloudbeat we might be missing, otherwise I have been trying to trace this into the agent build. I managed to get a successful build by editing the PLATFORMS variable hard coded in the cloudbeat Makefile and the platform list in the magefile, but I haven't tested it yet.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 1, 2022

It does seems like an issue with the agent itself and not the libbeat changes. APM was tested against the libbeat V2 branch without issue today. Perhaps this is something specific to running as a container that we haven't hit yet. Hopefully we can get past the build issue here.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 2, 2022

@fearful-symmetry I confirmed I can build this on an M1 Mac for linux/arm64 using the main branch of agent, but not on the feature-arch-v2 branch. We must have broken the agent build somehow, possibly between now when this was first tested.

@eyalkraft if you get a chance can you try running make PackageAgent again from the latest feature-arch-v2 branch? If it breaks for you as well that will give us a narrow time range to search for when this likely broke. It's a holiday in the US and Canada Monday so we'll pick this back up afterwards.

I have some suspicion that the Docker base image versions are stale on the agent V2 branch.

@eyalkraft
Copy link
Copy Markdown
Contributor Author

@cmacknz @fearful-symmetry Big thanks for trying to validate the problem on your machines.
I'm using an M1 machine as well.

I've tried building using elastic/elastic-agent@5f1e54f for the agent and I got the same error you did:

#11 0.579 chmod: cannot access '/usr/share/elastic-agent/data/elastic-agent-*/components/*beat': No such file or directory

The previous version of the feature-v2-arch branch I used was elastic/elastic-agent@43ad01d so it seems the build broke in one of the following commits:
https://github.com/elastic/elastic-agent/commits/feature-arch-v2?since=2022-08-30&until=2022-08-31

@fearful-symmetry
Copy link
Copy Markdown

Yah, it would make sense that this is an issue in Elastic-Agent, since there's a lot of build changes that have gone on in V2. This is the only PR that seems vaguely related? elastic/elastic-agent@5f1e54f

Also, the beats V2 PR has been merged, you should be safe to work against the feature-arch-v2 branch on elastic/beats

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 7, 2022

Build is now fixed after merging elastic/elastic-agent#1105. Will try to reproduce the problem / test against the bug fix branch we have already.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 7, 2022

Alright I can reproduce the issue, it fails at elastic-package stack up --version=8.5.0-SNAPSHOT -d -v with the same error using the local 8.5.0 agent tag created after the build with docker tag docker.elastic.co/beats/elastic-agent-complete:8.5.0-SNAPSHOT docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT.

elastic-package stack up -d -v exists because fleet server and the agent are unhealthy, and looking at the logs from the exited containers (after finding the exited container IDs with docker ps -a) shows:

Error: syncing download directory to STATE_PATH(/usr/share/elastic-agent/state) failed: lstat /usr/share/elastic-agent/data/elastic-agent-f95c9e/downloads: no such file or directory

Now we just need to figure out what is wrong.

Edit: I think just running docker run docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT is enough to reproduce this after tagging it.

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 7, 2022

It appears the downloads directory was removed in v2, and we just need to update this part of the agent. Fix coming shortly.

@kruskall
Copy link
Copy Markdown
Member

Alright I can reproduce the issue, it fails at elastic-package stack up --version=8.5.0-SNAPSHOT -d -v with the same error using the local 8.5.0 agent tag created after the build with docker tag docker.elastic.co/beats/elastic-agent-complete:8.5.0-SNAPSHOT docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT.

elastic-package stack up -d -v exists because fleet server and the agent are unhealthy, and looking at the logs from the exited containers (after finding the exited container IDs with docker ps -a) shows:

Error: syncing download directory to STATE_PATH(/usr/share/elastic-agent/state) failed: lstat /usr/share/elastic-agent/data/elastic-agent-f95c9e/downloads: no such file or directory

Now we just need to figure out what is wrong.

Edit: I think just running docker run docker.elastic.co/elastic-agent/elastic-agent-complete:8.5.0-SNAPSHOT is enough to reproduce this after tagging it.

It appears the downloads directory was removed in v2, and we just need to update this part of the agent. Fix coming shortly.

Hey @cmacknz 👋 , is there an ETA for this ? We've bumped into this issue while trying to integrate v2 in the APM Server. We're using the the agent image for system tests (see https://github.com/elastic/apm-server/blob/fbeb582b18f755da5bbc0f75c1eb4e383d4f66da/docker-compose.yml#L53)

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Sep 12, 2022

We've had to pause on v2 development for a short while to deal with some urgent customer escalations unfortunately.

The tracking issue for the bug is here elastic/elastic-agent#1159. This will be one of the first things we fix once we start up again.

@kruskall
Copy link
Copy Markdown
Member

We've had to pause on v2 development for a short while to deal with some urgent customer escalations unfortunately.

The tracking issue for the bug is here elastic/elastic-agent#1159. This will be one of the first things we fix once we start up again.

Thank you for the update! 🙇

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 18, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b agent-v2 upstream/agent-v2
git merge upstream/main
git push upstream agent-v2

@eyalkraft
Copy link
Copy Markdown
Contributor Author

@elastic/cloudbeat team will handle https://github.com/elastic/security-team/issues/4792 as part of Sprint 17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants