Conversation
8673f3a to
78e3ae7
Compare
Gitlab CI Configuration Changes
|
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 0 | 8 | 0 |
ℹ️ Diff available in the job log.
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: febb18c Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.89 | [-1.24, +5.01] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.89 | [-1.24, +5.01] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | +0.60 | [+0.39, +0.80] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +0.27 | [-2.51, +3.05] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.16 | [+0.12, +0.20] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.15 | [-0.48, +0.78] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | +0.12 | [-0.02, +0.27] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.06 | [-0.56, +0.68] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.01 | [-0.03, +0.06] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.01 | [-0.62, +0.64] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.04, +0.02] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.02 | [-0.08, +0.04] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.08 | [-0.67, +0.51] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.09 | [-0.20, +0.02] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | -0.18 | [-0.37, +0.01] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | -0.26 | [-0.29, -0.23] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_metrics | memory utilization | -0.32 | [-0.49, -0.14] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.51 | [-0.54, -0.47] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.55 | [-0.87, -0.23] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -2.43 | [-2.48, -2.38] | 1 | Logs |
Bounds Checks: ❌ Failed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ❌ | docker_containers_memory | simple_check_run | 9/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
|
3f758ff to
3126bb9
Compare
2a2ce30 to
96dbab2
Compare
1c27424 to
b39069f
Compare
On the one hand, `/tmp` is a problematic (ambiguous) path under Windows (`c:/tmp`, `c:\\tmp` and even `\\tmp` would all denote absolute paths). On the other hand, keeping the default disk cache path makes the cache persistent: - across reboots on developer machines, - in GitLab CI. (#40153) In case someone asks: disk and remote caching work well together: `bazel` forms a combined cache, reading from and writing to both, which reduces remote load and speeds up builds, see: https://www.buildbuddy.io/blog/bazels-remote-caching-and-remote-execution-explained/#flags
On the one hand, `/tmp` is a problematic (ambiguous) path under Windows (`c:/tmp`, `c:\\tmp` and even `\\tmp` would all denote absolute paths). On the other hand, keeping the default disk cache path allows to persist the cache: - across reboots on developer machines, - in GitLab CI. (#40153) In case someone asks: disk and remote caching work well together: `bazel` forms a combined cache, reading from and writing to both, which reduces remote load and speeds up builds, see: https://www.buildbuddy.io/blog/bazels-remote-caching-and-remote-execution-explained/#flags
7a0f243 to
b3f95eb
Compare
| when: on_success | ||
| - key: bazel-$CI_RUNNER_DESCRIPTION | ||
| paths: | ||
| - .cache/bazel/cache |
There was a problem hiding this comment.
Do we really want to have this? CI Infra team is working on persistent runners this sprint so we should be able to reuse directories on the workers, sharing disk cache via Gitlab will actually add to the Runner's startup time.
There was a problem hiding this comment.
Do we really want to have this?
Yes.
tl;dr: I approach my PRs with a “stage objective” in mind.
This probably comes from my habit of working in environments where it’s not acceptable to deliver intermediate results that aren’t self-contained, even if we expect them to be replaced soon. In my experience, “soon” often stretches out when we rely on other teams who naturally have different priorities.
That’s why I prefer to avoid situations where, even for a few days, the CI experience is degraded (for example, downloading dependencies on every single run). To me, it’s worth having this temporary stage in place while waiting for the final solution.
| @@ -122,3 +122,41 @@ build_processed_btfhub_archive: | |||
| - dda inv -- -e system-probe.process-btfhub-archive --branch $BTFHUB_ARCHIVE_BRANCH | |||
There was a problem hiding this comment.
I would rather have a separate yaml file to include it to keep the current deps_build.yml clean. Something like deps_bazel_build.yml
There was a problem hiding this comment.
✅ Done in second commit.
to keep the current
deps_build.ymlclean.
... not only the said file is kept clean, but the extraction also allows to bother fewer reviewers.
.gitlab/deps_build/deps_build.yml
Outdated
| variables: | ||
| ARCH: x64 | ||
| SCRIPT: |- | ||
| bazel build @bzip2//... |
There was a problem hiding this comment.
Why is the target different for Windows? I would at least write a comment or add a todo hinting that Windows support isn't complete yet
There was a problem hiding this comment.
Possible Drawbacks / Trade-offs
- on Windows, compilation errors made be descope dependencies being built to only
bzip2. This will be of course addressed in a subsequent PR,
There was a problem hiding this comment.
✅ Answered in 3rd commit, which adds the following message: (better than a comment to raise awareness)
🟡 TODO(regis): compilation errors remain to be addressed - limiting to a working subset for the time being
What does this PR do?
Now
bazeliskis available on all our CI executors (macOS runners, Linux & Windows containers), this change secures an initial part of ourbazelsetup while providing reusable job templates to ease caching in CI.Practically, it adds a handful of jobs focusing on building
bazeldependencies in the corresponding GitLabdeps_buildstage.Overall, it consists in:
bazeliskproperly bootstrapsbazelacross all platforms, (primary scope of the PR)bazel:-prefixed CI job templates and dogfooding them, (i.e. by building deps)bazelisk/bazelcaches to GitLab caching capabilities (runner-based as of now): installed binaries, "repository cache", "repo contents cache", and "disk cache" are all saved/restored correctly to/from GitLab while honoring OS/architecture boundaries,.bazelignoreand.gitignore,Motivation
Securing some initial
bazelconfiguration in CI :Possible Drawbacks / Trade-offs
bzip2. This will be of course addressed in a subsequent PR,Additional Notes
Main addressed challenges:
tools/bazel*wrappers:--output_user_roottriggers the repo contents cache error, even though the target directory is excluded with.bazelignorebazelbuild/bazel#26384bzip2dependency: fetch from a reachable source:bazelretrievebzip2on AWS-hosted runners #40219bazelspawn strategy: fallback to a permissive strategy sincesandboxedis unsupportedbazelspawn strategy on Windows #40328tools/bazelwrapper: fallback to batch (tools/bazel.bat), sincebashis discouraged bybazelin this case andtools/bazel.ps1poses detection problems,robocopyinstead ofcopy/move/xcopy.Footnotes