Skip to content

Optimize short-circuitable folds in MCP#570

Merged
sim642 merged 3 commits intomasterfrom
mcp-short-circuit
Feb 1, 2022
Merged

Optimize short-circuitable folds in MCP#570
sim642 merged 3 commits intomasterfrom
mcp-short-circuit

Conversation

@sim642
Copy link
Copy Markdown
Member

@sim642 sim642 commented Jan 26, 2022

This PR is an attempt to optimize the following MCP domain/context/access functions:

  • equal
  • compare
  • leq
  • is_top
  • is_bot
  • may_race

The previous implementations used folds, which inevitably had to iterate over the entire list each time. Even though the folding functions themselves used the short-circuiting binary operators, this didn't short-circuit much. Moreover, all the inefficient List.assoc calls were still happening just to pass the corresponding arguments to the folding function (which in the short-circuiting case didn't even use them).
These assocs are a pain because as the fold goes down the list, each corresponding assoc needs to go through more of the other list. Overall, this means O(n²) complexity.

This PR implements alternative short-circuitable folds for MCP, which abort the list iteration via local exception to short-circuit. It not only avoids the unnecessary list tail iteration, but also all the assoc calls that would unnecessarily happen there.

This is quite relevant because while profiling slow race warning performance that @vesalvojdani noticed, it turns out that 30% of race warning time is just spent on MCPAccess.A.compare while manipulating sets of accesses.
I'm speculating that this might even have general performance improvement since path-sensitivity also uses sets of MCP.D values.

TODO

  • Benchmark.

@sim642 sim642 added the performance Analysis time, memory usage label Jan 26, 2022
@michael-schwarz
Copy link
Copy Markdown
Member

michael-schwarz commented Jan 27, 2022

If it is really the calls to assoc that are biting us here, one could also think about at least reducing the complexity of assoc_dom calls by introducing some dedicated data structure for it that makes accesses O(1).

As the list of all available analyses is typically small, does not change during analysis, and numbered starting at zero, one could think about generating a (module Bla.S) option array where the index corresponds to the number of the analysis?

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Jan 27, 2022

If it is really the calls to assoc that are biting us here, one could also think about at least reducing the complexity of assoc_dom calls by introducing some dedicated data structure for it that makes accesses O(1).

As the list of all available analyses is typically small, does not change during analysis, and numbered starting at zero, one could think about generating a (module Bla.S) option array where the index corresponds to the number of the analysis?

I've been thinking about this as well, but also the fact that the whole assoc list representation in MCP might need overhauling, for example to use Map instead. Another possibility would be to ensure that analyses in those lists have well-defined and consistent order (e.g. sorted), so they could simply be zipped. I'll open a new issue about this.

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Jan 27, 2022

On sv-benchmarks there's not much of a difference, maybe ~2% improvement:
image
Full results table

But there also isn't big multithreaded benchmarks there, so the differences aren't as pronounced as in #571.

@sim642 sim642 added the pr-dependency Depends or builds on another PR, which should be merged before label Jan 28, 2022
@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Jan 28, 2022

I would merge this after #578 because there are major conflicts between the two.

@sim642 sim642 force-pushed the mcp-short-circuit branch from 05f3885 to 958e2a0 Compare January 31, 2022 14:14
@sim642 sim642 changed the base branch from master to mcp-no-assoc January 31, 2022 14:14
@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Jan 31, 2022

I now rebased this on top of #578 and also simplified the implementations: instead of using exceptions to escape the fold, I'm now just using for_all* functions on lists, which also stop iterating on first false.

I also benchmarked this new version and it's much more impressive than the previous version.

This compared to #578

There's an additional ~10% speedup:
image

#578 and this compared to master

Together the two give ~27% speedup:
image

Full results table

@michael-schwarz
Copy link
Copy Markdown
Member

Nice! I'm always amazed at how much time we seem to waste in non-obvious places.

Base automatically changed from mcp-no-assoc to master February 1, 2022 08:46
@sim642 sim642 removed the pr-dependency Depends or builds on another PR, which should be merged before label Feb 1, 2022
@sim642 sim642 merged commit f01914a into master Feb 1, 2022
@sim642 sim642 deleted the mcp-short-circuit branch February 1, 2022 08:48
@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Analysis time, memory usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants