Add profiling support to UCM #5856

dolio · 2025-09-02T16:58:09Z

This adds a couple commands which record and output profiling information after running a unison program.

run.profiled provides a relatively short output immediately in ucm on completion of the program. The top 25 most costly functions are calculated, and the recorded call tree is filtered to only include those as leaves, which hopefully trims the output to be somewhat reasonable. This is roughly the information you'd be looking for when actually using the full trace.
run.profiled.full takes an additional file name argument, and writes all the information to that file. Most of the time, this is a table of the top 25 most costly functions and a full breakdown of all the recorded calls in a tree, showing local and inherited time costs for each function. However, if the file name ends in .ticks or .folded, an alternate file format is used that just records the number of entries. This latter file can be fed directly into flamegraph.pl to produce an SVG flame graph of the profile.

The implementation of the profiler got a little tricky. I came to the conclusion that to get good results, it was necessary to check the profiler almost every iteration through the interpreter, which has performance impacts. So, I created a type class that allows specializing the interpreter loop to the profiler. This means that for the non-profiling loop, the implementation is essentially identical to what we had before, and has no decreased performance. Running with profiling is noticeably slower, by contrast.

I also tried to use some GHC internal mechanisms to get better sampling. When possible, the way the profiler works is to set timeouts, and record a tick when the timeout happens. This means the actual expense of instructions determines how likely it is for a stack to be recorded in the profile (so, e.g. unison functions that do expensive foreign calls should show up more in the profile).

However, it seems that this functionality doesn't exist on Windows. So, instead, I implemented a profiler that periodically calls getCPUTime to decide whether to record a tick. It also uses a counter because calling getCPUTime every iteration seemed to be too expensive (like, another order of magnitude slower than the other profiler). I'm uncertain how this methodology is affected by the relative cost of basic operations, so I used conditional compilation to only use it when the other option doesn't exist.

For functions introduced by compilation, currently no names are shown, just hashes. I'll work on improving that in a follow up PR.

When profiling is turned on, the runtime will record a stack snapshot approximately every 500 applications. Then after evaluation is complete, the information will be processed and displayed, either on screen or to a file.

The new approach uses type class specialization to switch between a normal interpreter and a profiled interpreter. In the normal interpreter, all 'profiling' calls are just optimized away. This removes the periodic yielding from the interpreter, as using the same system to yield would incur slowdowns in normal operation.

This makes non-allocating loops preemptible, so that they can't starve other threads. This should make tuning the thread time slice via RTS flags more reliable, so that manual yielding in the interpreter is no longer necessary.

dolio · 2025-09-02T17:04:52Z

Oh, one other notable thing about this PR...

Because of the more invasive nature of the profiling ticks, I've removed the manual yield calls from the interpreter. I'm reasonably sure that what that actually did was reduce the time slices of threads running unison code, as well as allowing preemption of tight loops.

Instead, I've added options (-fno-omit-yields) to the build configuration that make GHC always insert potential yields into code. Normally it omits these from non-allocating loops, but the option makes it put them in anyway. This means that arithmetic loops can still be preempted by GHC's scheduler (and it doesn't seem to significantly lower performance). Together with the -C RTS option, it should be possible to get similar behavior to the manual yielding before. Just set a smaller time slice (default is 20ms), and the more reliable preemption should actually ensure that threads only get that time slice.

It was previously only based on the cabal.project.freeze file, which changes very rarely. However, a change to build flags (as happened in unisonweb#5856) can cause dependencies to need to be rebuilt. But if the cache key was a perfect match, we wouldn’t cache the updated dependency builds. This adds a hash for all Cabal files to the key, so any change that would affect dependencies causes a cache miss. It also adds the old key (which is a prefix of the new one) as a restore key, so even when cabal files change, we load a cache for the correct LTS (GHC version, etc.) which can prevent cache churn in the period after LTS updates.

dolio added 8 commits August 26, 2025 19:06

Add support for profiling to the runtime

447dbc1

When profiling is turned on, the runtime will record a stack snapshot approximately every 500 applications. Then after evaluation is complete, the information will be processed and displayed, either on screen or to a file.

Fix tests w/r/t profiling arguments

165528b

Produce flamegraph.pl appropriate output when file extension is .folded

f8e5eb1

Merge remote-tracking branch 'origin/trunk' into topic/profiling

76f170b

Turn on -fno-omit-yields project-wide.

6aea1e7

This makes non-allocating loops preemptible, so that they can't starve other threads. This should make tuning the thread time slice via RTS flags more reliable, so that manual yielding in the interpreter is no longer necessary.

Profiling output/help tweaks

14b226d

Do a CPUTime based profiler on Windows

4b1515c

dolio requested review from ChrisPenner, aryairani and pchiusano September 2, 2025 16:58

automatically run ormolu

8b62f3c

aryairani approved these changes Sep 2, 2025

View reviewed changes

aryairani merged commit ae336e4 into trunk Sep 2, 2025
1 check passed

aryairani deleted the topic/profiling branch September 2, 2025 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add profiling support to UCM #5856

Add profiling support to UCM #5856

Uh oh!

dolio commented Sep 2, 2025

Uh oh!

dolio commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add profiling support to UCM #5856

Add profiling support to UCM #5856

Uh oh!

Conversation

dolio commented Sep 2, 2025

Uh oh!

dolio commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants