Skip to content

ci: fix CI failures (gvm v0.6.0 + skip broken cgroup tests)#273

Merged
orestisfl merged 15 commits intoelastic:mainfrom
orestisfl:update-gvm
Dec 11, 2025
Merged

ci: fix CI failures (gvm v0.6.0 + skip broken cgroup tests)#273
orestisfl merged 15 commits intoelastic:mainfrom
orestisfl:update-gvm

Conversation

@orestisfl
Copy link
Copy Markdown
Contributor

@orestisfl orestisfl commented Dec 10, 2025

What does this PR do?

This PR fixes multiple CI failures:

1. Update gvm to v0.6.0 (fixes Go download 403 errors)

The CI builds were failing with HTTP 403 errors when downloading Go:

gvm: error: failed downloading from https://storage.googleapis.com/golang/go1.24.7.windows-amd64.zip: download failed with http status 403

gvm v0.6.0 migrated to the official Go downloads API at go.dev/dl.

See: andrewkroh/gvm#117

Updated gvm version in:

  • .buildkite/pipeline.yml (SETUP_GVM_VERSION env var)
  • .buildkite/scripts/run-win-tests.ps1 (hardcoded URL)

Tested locally:

$ /tmp/gvm-0.6.0 --version
0.6.0
$ eval "$(/tmp/gvm-0.6.0 1.24.7)" && go version
go version go1.24.7 linux/amd64

This aligns with the beats repository configuration.

2. Skip flaky cgroup tests (issue #270)

Container tests fail when cgroups are unavailable due to:

  • Private cgroup namespace: cgroup paths contain /../.. which can't be resolved
  • Non-root user: permission denied accessing cgroup files

These are treated as non-fatal errors in production code. Our testing suite was designed to predict exactly when these failures are going to happen but something has gone wrong in the meantime, more investigation is needed.

Changes:

  • Skip cgroup assertion when stats.Cgroup == nil in TestContainerMonitoringFromInsideContainer and TestSelfMonitoringFromInsideContainer
  • Pass CGROUPNSMODE env var from test framework to inner tests
  • Only assert cgroups in validateProcResult when cgroupNSMode == "host" && userID == 0
  • Filter "Non-fatal error" messages from FatalLogMessages check
  • Improve test logging (replace Verbose field with t.Logf), go automatically prints logs with -v or on failure.
  • Improve assertions in validateProcResult. The function does not fail immediately anymore and keeps going to maximize failing test's context.

Why is it important?

CI is completely broken - no PRs can be merged.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md

The CI builds were failing with HTTP 403 errors when downloading Go:

  gvm: error: failed downloading from https://storage.googleapis.com/golang/go1.24.7.windows-amd64.zip: download failed with http status 403

See andrewkroh/gvm#117 that describes the issue
on gvm's side.

Tested locally:
  $ /tmp/gvm-0.6.0 --version
  0.6.0
  $ eval "$(/tmp/gvm-0.6.0 1.24.7)" && go version
  go version go1.24.7 linux/amd64

Updated gvm version in:
- .buildkite/pipeline.yml (SETUP_GVM_VERSION env var)
- .buildkite/scripts/run-win-tests.ps1 (hardcoded URL)

This aligns with the beats repository configuration.
@orestisfl orestisfl self-assigned this Dec 10, 2025
@orestisfl orestisfl requested a review from a team as a code owner December 10, 2025 08:16
@orestisfl orestisfl added bug Something isn't working Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Dec 10, 2025
@orestisfl orestisfl requested review from AndersonQ and VihasMakwana and removed request for a team December 10, 2025 08:16
@mauri870 mauri870 requested review from mauri870 and removed request for VihasMakwana December 10, 2025 11:18
@orestisfl orestisfl marked this pull request as draft December 10, 2025 11:25
@pierrehilbert
Copy link
Copy Markdown

Good catch for the gvm version, I did it on Beats and Ingest-dev but didn't think about here!!

@orestisfl orestisfl changed the title ci: update gvm to v0.6.0 to fix Go download failures ci: fix CI failures (gvm v0.6.0 + skip flaky cgroup tests) Dec 10, 2025
@orestisfl orestisfl added the flaky-test Unstable or unreliable test cases. label Dec 10, 2025
@orestisfl orestisfl marked this pull request as ready for review December 10, 2025 16:52
@orestisfl orestisfl requested a review from mauri870 December 10, 2025 16:52
@orestisfl orestisfl changed the title ci: fix CI failures (gvm v0.6.0 + skip flaky cgroup tests) ci: fix CI failures (gvm v0.6.0 + skip broken cgroup tests) Dec 10, 2025
@orestisfl orestisfl merged commit d104506 into elastic:main Dec 11, 2025
5 checks passed
@orestisfl orestisfl deleted the update-gvm branch December 11, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants