Skip to content

pprof: support mutex contention and blocked goroutine profiling#41154

Merged
pchaigno merged 1 commit intocilium:mainfrom
DataDog:ai/pprof-mutex-block-profiles
Aug 21, 2025
Merged

pprof: support mutex contention and blocked goroutine profiling#41154
pchaigno merged 1 commit intocilium:mainfrom
DataDog:ai/pprof-mutex-block-profiles

Conversation

@antonipp
Copy link
Copy Markdown
Contributor

Description

We support exporting profiling data using pprof, however, at the moment, it’s limited to CPU, memory, and goroutine profiles. This PR adds support for two more types of profiles:

This data is not exposed by default. In order to enable it, we need to explicitly set these runtime configuration values:

This PR adds two new flags to enable these profilers. The blocked goroutine profiler comes with performance overhead so I added some callouts about that as well.

Testing

Tested on the Operator. Enabled :

--operator-pprof-mutex-profile-fraction=1
--operator-pprof-block-profile-rate=1

Then enabled mutex and block pprof scraping with Datadog (these profiles are of course compatible with any monitoring tool, I just used DD for convenience):

"go_pprof_scraper": {
  "instances": [
    {
      "pprof_url": "http://127.0.0.1:6061/debug/pprof/",
      "profiles": ["cpu","heap","goroutine","mutex","block"],
      "service":"cilium-operator"
    }
  ]
}

and validated that the profiles were properly collected now:
image
image

pprof: support mutex contention and blocked goroutine profiling

@antonipp antonipp requested review from a team as code owners August 14, 2025 15:25
@antonipp antonipp requested review from bimmlerd and joamaki August 14, 2025 15:25
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 14, 2025
@antonipp antonipp requested a review from hemanthmalla August 14, 2025 15:25
Copy link
Copy Markdown
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. For the agent at least I think gops already provides this capability; possibly even with starting it at runtime instead of having to have it configured at startup - would that also work for you or is this needed?

@antonipp antonipp force-pushed the ai/pprof-mutex-block-profiles branch from d6944fa to e7b8894 Compare August 18, 2025 09:52
@antonipp antonipp requested review from a team as code owners August 18, 2025 09:52
@antonipp antonipp requested a review from gandro August 18, 2025 09:52
Copy link
Copy Markdown
Contributor Author

@antonipp antonipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the agent at least I think gops already provides this capability; possibly even with starting it at runtime instead of having to have it configured at startup - would that also work for you or is this needed?

Unless I'm missing something, I don't think that gops provides this functionality. It does have an integration with some pprof profile types, namely CPU (with gops pprof-cpu) and memory (with gops pprof-heap) but I don't really see any way for it to extract mutex or blocked goroutine profiles. And even if it could, it wouldn't get any data, because as far as I can tell, the functionality needs to be enabled in the code by calling the appropriate functions in the runtime package. Here's a similar example from the Istio project: istio/istio#44688 (comment)

@antonipp
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm looks good!

Copy link
Copy Markdown
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you're right, I confused the memory profiling things of gops with this; LGTM!

@antonipp
Copy link
Copy Markdown
Contributor Author

Hmm I realized that I also need to add the options to the clustermesh and kvstoremesh, I'll update the PR

Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
@antonipp antonipp force-pushed the ai/pprof-mutex-block-profiles branch from e7b8894 to 8f763b3 Compare August 18, 2025 11:31
@antonipp antonipp requested a review from a team as a code owner August 18, 2025 11:31
@antonipp antonipp requested a review from MrFreezeex August 18, 2025 11:31
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@HadrienPatte HadrienPatte added the release-note/misc This PR makes changes that have no direct user impact. label Aug 20, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 20, 2025
@pchaigno pchaigno added this pull request to the merge queue Aug 21, 2025
Merged via the queue into cilium:main with commit 3f34d20 Aug 21, 2025
74 of 75 checks passed
@cilium-release-bot cilium-release-bot bot moved this to Released in cilium v1.19.0 Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/misc This PR makes changes that have no direct user impact.

Projects

No open projects
Status: Released

Development

Successfully merging this pull request may close these issues.

7 participants