Skip to content

Optimize large CPAs#164

Merged
sim642 merged 15 commits intotracesfrom
traces-cpa-optimize
Apr 8, 2021
Merged

Optimize large CPAs#164
sim642 merged 15 commits intotracesfrom
traces-cpa-optimize

Conversation

@sim642
Copy link
Copy Markdown
Member

@sim642 sim642 commented Feb 3, 2021

In the traces branch, some new privatization implementations (mutex-oplus, mine, mine-nothread) keep a copy of everything in the local domain. Timing shows that this causes an unreasonable amount of time spent on equal, compare and hash of CPA, which are very naïve. We have physical equality and hashcons optimizations for the entire Spec, but this does nothing to improve the performance of Set operations in path-sensitivity, which I suspect is the culprit here.

Right now I made two changes, which notably improved the performance of those privatization implementations on some large benchmarks:

  1. Add physical equality check to equal.
  2. Add caching of hash. I did this by storing a lazy hash computation with the map.

Initially I thought using another hashcons more inside would leak a lot of memory by keeping lots of intermediate CPAs in the hashcons hashtable, so I didn't go that route. I later looked into BatHashcons and realized it uses Weak to avoid leaking memory that way, so it might be worth going that way after all.

For now, this is just a draft PR on top of the traces branch, where it really matters. The current implementation is ugly and changes all of MapDomain actually, not just CPA. A cleaner implementation would be to have the hash caching/hashconsing as a MapDomain functor to just apply it to CPA, but the leaky abstractions and lacking interfaces in MapDomain make it difficult.

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Feb 3, 2021

I now enforced MapDomain signatures and extracted OptHash into a functor to be just used for CPA. To my surprise, this can make CPA.hash a few times slower, which isn't great news.

Considering our heavy use of functors, maybe flambda would help a lot, especially if one level of functor can make such a difference.

This is very slow because every add/remove in a fold-loop relifts.
@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Feb 4, 2021

I now tried hashconsing instead of my hash-caching. Obviously it makes equal and hash instant even on large maps, but all the slow hash computation is just shifted to the lift. Overall, this is much slower due to patterns like using add in a fold-loop to construct a new map: every intermediate add recalculates the hash slowly to lift the result. So hashconsing isn't actually the answer here!

Well, it may be the answer if we didn't use the add in a fold-loop pattern. I think we have often done that to combine filtering and mapping while avoiding the intermediate result between filter and map, whereas filter_map (if it were available on MapDomain) might be more appropriate. Although changes like this likely won't solve all the cases: sometimes such loops are over other data structures.

Another idea might be to combine the two approaches to get some kind of lazy hashconsing, which would only hashcons maps that hash is calculated on. It would avoid the slowdown of add in fold-loops as the map itself isn't hashed in such case intermediately. But it might further improve equal performance by replacing the still slow equal comparison (when not physically equal) with a comparison of tags. There are more cases there though, e.g. when calling equal with one hashconsed and one non-hashconsed map.

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Apr 6, 2021

This has been sitting in the sidelines for now, but I think I'll have to see how this affects the newer heavy privatizations. I suppose they still can keep quite big CPAs around.

My biggest worry with this PR, is that the stricter interfacing of MapDomains broke the compilation of PathSensitive3 in witness generation, because it used the internals of a lifted map. Although I fixed it to at least compile, it's very probable that there things don't behave right anymore and witness generation would be broken. And for the time being, I don't really want to spend time trying to test and possibly fix it.

@sim642 sim642 marked this pull request as ready for review April 6, 2021 14:26
@sim642 sim642 added the performance Analysis time, memory usage label Apr 6, 2021
@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Apr 6, 2021

Here's a comparison on some benchmarks from goblint/bench that shows how more involved privatizations (global-history, mine-W, mine-lazy, mine-global) get around 50% time improvement just from this change.

Before

Screenshot_2021-04-06 Benchmarks on goblint-new

After

Screenshot_2021-04-06 Benchmarks on goblint-new(1)

@sim642 sim642 merged commit a6d0a0b into traces Apr 8, 2021
@sim642 sim642 deleted the traces-cpa-optimize branch April 8, 2021 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Analysis time, memory usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant