go tools bootstrap#1
Merged
Merged
Conversation
58a7c0c to
a0e70c7
Compare
a0e70c7 to
2d7232c
Compare
Member
|
Cool, getting started ! 👍 |
Contributor
Author
|
merging that so 👍 |
3 tasks
6 tasks
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
that referenced
this pull request
Jan 27, 2026
### What does this PR do? This PR adds support for a `WithMaxQuerySize` method to the MongoDB package that enables truncation on the `mongodb.query` span tag. I also ported this behavior to the new `mongo-driver.v2` package which supports the v2 release of the Mongo drivers. Opening this PR here first for feedback from our team and then will open on the upstream repo. The default is to not truncate query span tags, to avoid breaking changes. I'll ask in the upstream PR about whether we could truncate by default instead. ### Motivation This reduces the memory footprint of tracing Mongo applications that write large documents.
gh-worker-dd-mergequeue-cf854d Bot
pushed a commit
that referenced
this pull request
May 19, 2026
…SpanOperation (#4766) ### What does this PR do? Fixes a memory leak in AppSec when running under orchestrion, plus the related listener-cleanup asymmetry that compounds it. Two small, independent changes in two files: 1. **`ServiceEntrySpanOperation.Finish()` now calls `dyngo.FinishOperation`.** The op was pushed onto orchestrion's per-goroutine GLS stack by `StartAndRegisterOperation`, but its `Finish()` only flushed JSON tags and never invoked `dyngo.FinishOperation` — and `FinishOperation` is the **only** caller of `orchestrion.GLSPopValue`. On reused goroutines (HTTP keep-alive workers, gRPC handlers, worker pools), the parent op accumulated on the GLS stack one entry per request, retaining the span and the data-listener closures registered by `AppsecSpanTransport.OnServiceEntryStart`. The new call is **`defer`-ed**, not appended, so the GLS pop and `disable()` still run if a `TagSetter.SetTag` implementation panics during the tag-flush loop — otherwise the panic path would silently re-introduce the same leak. 2. **`operation.disable()` now also clears `dataBroadcaster.listeners`.** `disable()` cleared `eventRegister.listeners` but left its sibling `dataBroadcaster.listeners` populated — an asymmetry between two structurally identical listener maps, both populated by the same `addXListener` family. On ops that are properly finished, this only delays reclamation; combined with the GLS retention above, it amplifies the leak by keeping data-listener closures alive for every retained parent op. ### Motivation Fixes #4763. That issue covers fix (1) — the `ServiceEntrySpanOperation` GLS leak — and includes the same `MockGLS()`-based reproduction approach used in this PR's verification. This PR adds fix (2) on top: the `dataBroadcaster` cleanup asymmetry was found while tracing the same retention path and is responsible for amplifying the heap impact of (1). The two are independent — (1) removes the leak on the orchestrion path; (2) removes the asymmetry that would amplify any future regression of the same shape. ### Root cause walkthrough **Start path — both ops are registered:** ``` waf.StartContextOperation ├── trace.StartServiceEntrySpanOperation │ └── dyngo.StartAndRegisterOperation(entrySpanOp) → GLS push #1 (parent) └── dyngo.StartAndRegisterOperation(contextOp) → GLS push #2 (child) ``` **Finish path (before this PR) — only the child is unwound:** ```go // internal/appsec/emitter/waf/context.go func (op *ContextOperation) Finish() { dyngo.FinishOperation(op, ContextRes{}) // pops child from GLS op.ServiceEntrySpanOperation.Finish() // flushes tags only — no FinishOperation } ``` ```go // instrumentation/appsec/trace/service_entry_span.go (before) func (op *ServiceEntrySpanOperation) Finish() { // ranges over op.jsonTags and calls span.SetTag — does NOT call dyngo.FinishOperation } ``` Per request on the same goroutine, the GLS stack grows by one. `waf.ContextOperation.Finish()` is **unchanged** by this PR — its existing call sequence (pop child, then call parent's `Finish`) now correctly drives both lifecycles. ### Reproduction Two reproduction tests are not bundled in the PR (kept the diff minimal at 2 files / +16 / -1), but both pass at HEAD and fail when the corresponding fix is reverted. **Test 1 — GLS stack returns to baseline.** A note for reviewers running this locally: `orchestrion.Enabled()` is flipped by the orchestrion source-transform tool at build time, not by `-tags orchestrion`. Under plain `go test`, `CtxWithValue` therefore skips the GLS push branch and `GLSStackDepth()` always returns 0 — a test relying on the build tag alone would pass vacuously. The repo's `orchestrion.MockGLS()` (`internal/orchestrion/mock_gls.go`) installs a real `contextStack` and flips `enabled=true` for the duration of the test, which is what #4763's reproduction also uses. ```go package waf_test import ( "context" "testing" "github.com/DataDog/dd-trace-go/v2/instrumentation/appsec/trace" "github.com/DataDog/dd-trace-go/v2/internal/appsec/emitter/waf" "github.com/DataDog/dd-trace-go/v2/internal/orchestrion" ) func TestGLSStackReturnsToBaseline(t *testing.T) { defer orchestrion.MockGLS()() ctx := context.Background() baseline := orchestrion.GLSStackDepth() const iterations = 100 for i := 0; i < iterations; i++ { op, _ := waf.StartContextOperation(ctx, trace.NoopTagSetter{}) op.Finish() } if got := orchestrion.GLSStackDepth(); got != baseline { t.Errorf("GLS leaked: started at %d, now %d after %d cycles", baseline, got, iterations) } } ``` Observed when commit 1 of this PR is reverted: `GLS leaked: started at 0, now 100 after 100 cycles` — exactly +1 entry per request, matching the predicted leak rate. Stable at baseline with the fix applied. **Test 2 — `disable()` must clear `dataBroadcaster`.** ```go package dyngo import "testing" type repEvt struct{ val string } func TestDataBroadcasterNotClearedOnDisable(t *testing.T) { op := &operation{} addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) op.disable() op.dataBroadcaster.mu.RLock() defer op.dataBroadcaster.mu.RUnlock() total := 0 for _, ls := range op.dataBroadcaster.listeners { total += len(ls) } if total != 0 { t.Errorf("dataBroadcaster has %d listeners after disable(), want 0", total) } } ``` Observed when commit 2 is reverted: `dataBroadcaster has 3 listeners after disable(), want 0`. Passes at HEAD. Happy to include either or both tests in the PR if reviewers want — they were left out only to keep the diff narrowly scoped to the fix. ### Memory evidence On a long-lived process with AppSec + orchestrion enabled and steady HTTP traffic: - AppSec on, before fix: monotonic heap growth that scales with request count. - AppSec off: flat heap. - AppSec on, with both fixes: flat heap. Heap-live-size profile retention path matches the GLS analysis: ``` dyngo.addDataListener ← dyngo.OnData ← AppsecSpanTransport.OnServiceEntryStart ← dyngo.StartAndRegisterOperation (called once per HTTP request) ``` The path terminates in `StartAndRegisterOperation` with no matching `FinishOperation` — consistent with the GLS-pin hypothesis. ### Files touched - `instrumentation/appsec/trace/service_entry_span.go` — +9 / -1 - `instrumentation/appsec/dyngo/operation.go` — +7 / -0 Total: 2 files, +16 / -1. ### Reviewer's Checklist - [ ] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [x] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. Co-authored-by: e-n-0 <flavien.darche@datadoghq.com> Co-authored-by: dario.castane <dario.castane@datadoghq.com>
hannahkm
pushed a commit
that referenced
this pull request
May 22, 2026
…SpanOperation (#4766) ### What does this PR do? Fixes a memory leak in AppSec when running under orchestrion, plus the related listener-cleanup asymmetry that compounds it. Two small, independent changes in two files: 1. **`ServiceEntrySpanOperation.Finish()` now calls `dyngo.FinishOperation`.** The op was pushed onto orchestrion's per-goroutine GLS stack by `StartAndRegisterOperation`, but its `Finish()` only flushed JSON tags and never invoked `dyngo.FinishOperation` — and `FinishOperation` is the **only** caller of `orchestrion.GLSPopValue`. On reused goroutines (HTTP keep-alive workers, gRPC handlers, worker pools), the parent op accumulated on the GLS stack one entry per request, retaining the span and the data-listener closures registered by `AppsecSpanTransport.OnServiceEntryStart`. The new call is **`defer`-ed**, not appended, so the GLS pop and `disable()` still run if a `TagSetter.SetTag` implementation panics during the tag-flush loop — otherwise the panic path would silently re-introduce the same leak. 2. **`operation.disable()` now also clears `dataBroadcaster.listeners`.** `disable()` cleared `eventRegister.listeners` but left its sibling `dataBroadcaster.listeners` populated — an asymmetry between two structurally identical listener maps, both populated by the same `addXListener` family. On ops that are properly finished, this only delays reclamation; combined with the GLS retention above, it amplifies the leak by keeping data-listener closures alive for every retained parent op. ### Motivation Fixes #4763. That issue covers fix (1) — the `ServiceEntrySpanOperation` GLS leak — and includes the same `MockGLS()`-based reproduction approach used in this PR's verification. This PR adds fix (2) on top: the `dataBroadcaster` cleanup asymmetry was found while tracing the same retention path and is responsible for amplifying the heap impact of (1). The two are independent — (1) removes the leak on the orchestrion path; (2) removes the asymmetry that would amplify any future regression of the same shape. ### Root cause walkthrough **Start path — both ops are registered:** ``` waf.StartContextOperation ├── trace.StartServiceEntrySpanOperation │ └── dyngo.StartAndRegisterOperation(entrySpanOp) → GLS push #1 (parent) └── dyngo.StartAndRegisterOperation(contextOp) → GLS push #2 (child) ``` **Finish path (before this PR) — only the child is unwound:** ```go // internal/appsec/emitter/waf/context.go func (op *ContextOperation) Finish() { dyngo.FinishOperation(op, ContextRes{}) // pops child from GLS op.ServiceEntrySpanOperation.Finish() // flushes tags only — no FinishOperation } ``` ```go // instrumentation/appsec/trace/service_entry_span.go (before) func (op *ServiceEntrySpanOperation) Finish() { // ranges over op.jsonTags and calls span.SetTag — does NOT call dyngo.FinishOperation } ``` Per request on the same goroutine, the GLS stack grows by one. `waf.ContextOperation.Finish()` is **unchanged** by this PR — its existing call sequence (pop child, then call parent's `Finish`) now correctly drives both lifecycles. ### Reproduction Two reproduction tests are not bundled in the PR (kept the diff minimal at 2 files / +16 / -1), but both pass at HEAD and fail when the corresponding fix is reverted. **Test 1 — GLS stack returns to baseline.** A note for reviewers running this locally: `orchestrion.Enabled()` is flipped by the orchestrion source-transform tool at build time, not by `-tags orchestrion`. Under plain `go test`, `CtxWithValue` therefore skips the GLS push branch and `GLSStackDepth()` always returns 0 — a test relying on the build tag alone would pass vacuously. The repo's `orchestrion.MockGLS()` (`internal/orchestrion/mock_gls.go`) installs a real `contextStack` and flips `enabled=true` for the duration of the test, which is what #4763's reproduction also uses. ```go package waf_test import ( "context" "testing" "github.com/DataDog/dd-trace-go/v2/instrumentation/appsec/trace" "github.com/DataDog/dd-trace-go/v2/internal/appsec/emitter/waf" "github.com/DataDog/dd-trace-go/v2/internal/orchestrion" ) func TestGLSStackReturnsToBaseline(t *testing.T) { defer orchestrion.MockGLS()() ctx := context.Background() baseline := orchestrion.GLSStackDepth() const iterations = 100 for i := 0; i < iterations; i++ { op, _ := waf.StartContextOperation(ctx, trace.NoopTagSetter{}) op.Finish() } if got := orchestrion.GLSStackDepth(); got != baseline { t.Errorf("GLS leaked: started at %d, now %d after %d cycles", baseline, got, iterations) } } ``` Observed when commit 1 of this PR is reverted: `GLS leaked: started at 0, now 100 after 100 cycles` — exactly +1 entry per request, matching the predicted leak rate. Stable at baseline with the fix applied. **Test 2 — `disable()` must clear `dataBroadcaster`.** ```go package dyngo import "testing" type repEvt struct{ val string } func TestDataBroadcasterNotClearedOnDisable(t *testing.T) { op := &operation{} addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) addDataListener(&op.dataBroadcaster, DataListener[repEvt](func(e repEvt) {})) op.disable() op.dataBroadcaster.mu.RLock() defer op.dataBroadcaster.mu.RUnlock() total := 0 for _, ls := range op.dataBroadcaster.listeners { total += len(ls) } if total != 0 { t.Errorf("dataBroadcaster has %d listeners after disable(), want 0", total) } } ``` Observed when commit 2 is reverted: `dataBroadcaster has 3 listeners after disable(), want 0`. Passes at HEAD. Happy to include either or both tests in the PR if reviewers want — they were left out only to keep the diff narrowly scoped to the fix. ### Memory evidence On a long-lived process with AppSec + orchestrion enabled and steady HTTP traffic: - AppSec on, before fix: monotonic heap growth that scales with request count. - AppSec off: flat heap. - AppSec on, with both fixes: flat heap. Heap-live-size profile retention path matches the GLS analysis: ``` dyngo.addDataListener ← dyngo.OnData ← AppsecSpanTransport.OnServiceEntryStart ← dyngo.StartAndRegisterOperation (called once per HTTP request) ``` The path terminates in `StartAndRegisterOperation` with no matching `FinishOperation` — consistent with the GLS-pin hypothesis. ### Files touched - `instrumentation/appsec/trace/service_entry_span.go` — +9 / -1 - `instrumentation/appsec/dyngo/operation.go` — +7 / -0 Total: 2 files, +16 / -1. ### Reviewer's Checklist - [ ] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [x] New code is free of linting errors. You can check this by running `make lint` locally. - [x] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [x] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. Co-authored-by: e-n-0 <flavien.darche@datadoghq.com> Co-authored-by: dario.castane <dario.castane@datadoghq.com>
Merged
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What it does
Bootstrap the project with some initial stuff such as:
glidepackage managerRakefileto do almost anything