Add support for path exploration to CBMC/JBMC, in addition to full bounded model checking#1641
Conversation
|
Thanks for this patch set; it looks really exciting. Paging @theyoucheng who has a similar patch set and @bjowac who was the original author of symex. Also cross linking against #749 for my last attempt at getting everyone who cared about per-path symex talking to each other so we didn't duplicate work. |
3050f65 to
0924fa7
Compare
|
@martin-cs thanks for the comment. This is a complete implementation and is working well, but I'd like to braindump a few future refinements here for the record:
cheers! |
There was a problem hiding this comment.
Some comments. Mostly doxy comments, style and some inconsistency in how the arguments are described. Most substantively I'd appreciate comments where appropriate about why namespaces are being re-initialised to point at different symbol tables, and at what stage they point to different places (therefore, if a future coder is editing goto_symext and adds an ns.lookup, what tables do they think they're referring to? Is it outright invalid (uninitialised) at any point?)
src/cbmc/bmc.cpp
Outdated
src/cbmc/bmc.cpp
Outdated
There was a problem hiding this comment.
Recommend making a "should" statement here, as otherwise it's often confusing whether the statement says what should have been or what went wrong.
src/cbmc/bmc.cpp
Outdated
There was a problem hiding this comment.
Recommend throwing structured errors rather than plain strings wherever possible
There was a problem hiding this comment.
@smowton this catch is to deal with a throw from cbmc_solverst, i.e. it's code that predates this patchset. As far as I can see, that code only ever throws the number 0 (and it prints a message to messaget::error() just before it throws). I'm not familiar with cprover's error handling, what sort of error would you recommend that I throw from cbmc_solvert instead of 0? Or should I just leave that there?
There was a problem hiding this comment.
"Pretty crap," in brief ;) If it's old code you're catching from, I guess you intend to catch everything? In that case I'd catch const char * and const std::string & (perhaps mergable if exception catching permits implicit construction to catch, but I doubt it) and then catch(...) to catch other errors. If you specifically intend to catch some things but let others get by then you should at least document your intent in comments, and probably change the throwing code to throw something more specific.
There was a problem hiding this comment.
#include "martin_comment_about_catch_all_vs_invariants.h"
If you are going to catch (...) please exit and / or throw upwards; many exceptions are non-recoverable.
src/cbmc/bmc.cpp
Outdated
There was a problem hiding this comment.
If there are this many args, probably clearest to break after ( and then list one argument per line
src/cbmc/bmc.cpp
Outdated
There was a problem hiding this comment.
Please add function-level doxy comments
There was a problem hiding this comment.
The prevalent style seems to be having the doxy comments in header files, right? I already have one there, but it's a bit brief so I'll expand it.
src/goto-symex/symex_goto.cpp
Outdated
There was a problem hiding this comment.
If this is a debug message, guard with #ifdef DEBUG_GOTO_SYMEX or something (and the same for iostream include above). Otherwise, use a messaget or get one passed in.
There was a problem hiding this comment.
If #1625 gets merged in before this, symext will have a messaget mutable member as you suggested so I'll rebase and use that
There was a problem hiding this comment.
FWIW I'll merge #1625 as soon as CI completes.
src/goto-symex/symex_goto.cpp
Outdated
src/goto-symex/symex_goto.cpp
Outdated
There was a problem hiding this comment.
As above, suggest using an enum type
src/goto-symex/symex_goto.cpp
Outdated
There was a problem hiding this comment.
Another use of std streams
src/goto-symex/symex_main.cpp
Outdated
There was a problem hiding this comment.
As above, could you comment why namespaces are being re-initialised to point at different tables like this? What tables will ns refer to and at what stage in proceedings?
|
Also edited title to the usual "voice" for such things :) |
|
@smowton thanks so much for the prompt review, I'll try and get this done within the next few days. |
|
@karkhaz Multi-threading certainly is exciting but may I add a small note of caution. SAT solvers are very sensitive to memory latency and cache sizes. Given parts of the memory subsystem are shared between cores, running a SAT solver in one core can adversely affect the performance of a SAT solver running in another. Before you start work, I'd recommend trying the following experiment. *. Pick a benchmark that takes at least 30 seconds and uses at least at least 50 MB of memory in the SAT solver. If total time is flat and work-rate peaks at N; brilliant, my warning no longer applies. If the peak is elsewhere and the lower it is, consider if it is worth the investment of time to multi-thread stuff. ( Also, a seeded pseudo-random queuing policy, run as a portfolio could also give a lower bound / estimate on the performance of the multi-threading.) The other exciting topic, with @DanielNeville can provide input on is when to merge and when to queue. I'm told that "merge forwards branches and queue backwards ones" does interesting things... |
|
@martin-cs thanks for this information! I'd certainly be interested in better merging heuristics. Something that I have working (not in this patchset) is the ability to put Hopefully there's a lot of experimenting to be done here: what are the best heuristics for deciding when to merge, and what are the best heuristics for deciding on which path to explore next. I'm planning to gather some amount of empirical data about this, and would be very interested to read data gathered by others also. |
b4b8520 to
892b7ae
Compare
martin-cs
left a comment
There was a problem hiding this comment.
[2/8] YES.
[3/8] Sure.
[4/8] Yes; we should have done this when JBMC was first forked.
[5/8] Yes, MUCH better with explanations.
[6/8] Maybe? It seems like this is adding names for two particular search heuristics, "depth-first-take-goto" and "depth-first-ignore-goto". To be honest I'd rather see a more general option / infrastructure that can easily be extended or otherwise I fear we will see a profusion of options like this that do similar-but-not-the-same things.
[7/8] Thanks for fixing.
[8/8] YES.
src/cbmc/bmc.h
Outdated
There was a problem hiding this comment.
This feels a bit clunky and not really in keeping with how the rest of the code base does extensions of functionality. Is there a simpler way of doing this? Also the apparent use-case for this (loop unwinding for the Java front-end) may be going away / getting transformed, maybe @mgudemann or @smowton might be able to say more.
There was a problem hiding this comment.
I think this should be cleaned up in the course of #1869
src/cbmc/bmc.cpp
Outdated
src/goto-symex/goto_symex.h
Outdated
There was a problem hiding this comment.
Should this be guard coded? I realise it was before...
There was a problem hiding this comment.
I don't understand the question, sorry: do you mean hard-coded? If so, what would be the alternative, is there some kind of symbolic constant for this?
There was a problem hiding this comment.
Why are you removing the output method?
There was a problem hiding this comment.
Because I removed the ns member, which is needed for that function. This was all part of my effort to clarify what classes own what objects, but I'm afraid I don't recall exactly why ns was a member of this class (perhaps @tautschnig could elaborate? I think we discussed this and I forgot...)
When I wanted to create the branch_pointt class (consisting of an equation and a state), I tried to remove all members that would point to objects that might become stale, since we need to resume the equation and state later and stale/out-of-scope references led to segfaults. I think the ns member was one of the victims of this purge, and so was the dirtyt member (that's why I changed it from a pointer to a member)...
There was a problem hiding this comment.
I think we weren't able to figure out why ns should possibly be a member here. Indeed @martin-cs is right that output needs review. We should have a method that is explicitly passed a namespacet as an argument. That implies that the operator<< declarations should go away as well (their definitions have been removed!). I'd suggest to re-instantiate the output method exactly as it was defined before, except that it should take an additional namespacet parameter.
There was a problem hiding this comment.
OK, will push that in a few minutes
|
I realise this has been going on for some while and fear we are trying @karkhaz 's considerable patience ... BUT ... can I say "accept everything apart from 6/8"? My objection to 6/8 isn't ideological, if it gets accepted, it gets accepted. I just think it will get changed pretty soon and we could easily get it right now and save hassle later. Symbolic execution heuristics (just "which path next, not even 'should I merge' or 'when should I feasibility check') are (somehow still?) an active research topic (@DanielNeville @cesaro @theyoucheng and others may be able to comment). It's one of the obvious low-hanging fruit with this. Many of these will require non-orthogonal changes to the branch taken behaviour "ignore forwards branches, follow backwards branches", "follow backwards branches but only the first 2 times", "take everything that looks like an error path because no-one checks those", "pick at random because I want to pretend I'm fuzzier than I am" which feel kind of hard to square with the proposed 6/8 solution. So what I would /suggest/ (I'm not going to demand) is that you swap it out for a more general "heuristic" option that not only picks which way to go but also sorts the work queue. The two options would then be depth-first-follow-goto and depth-first-ignore-goto or something. |
|
@martin-cs there's two separate "sites" where we chose what to do next. The first site is deciding which path to put onto the workqueue (this is what is controlled by the command line option in 6/8), and the second site is deciding what to take off the workqueue (which is what I believe you're talking about). You're totally correct that the choice of what to take off the workqueue will change soon, and can be decided by advanced heuristics. I'm working on this now, and that code will go into I wonder whether you're advocating for heuristics to control both pushing onto the queue and popping off? I suppose I could try that as well, if that's what you mean... |
|
@karkhaz From an implementation point of view, yes it makes complete sense. However I think it might be good to think about things a little more theoretically. Per-path symex generates a (likely infinite) tree. Any heuristic (including parallel exploration!) is then an ordering of nodes in the tree (i.e. an ordering with the tree as a sub-order). In implementation terms, it's like you add both new states to the work queue, and then pick the next state from the work queue. If you're working depth-first then it will be one of the two (well... assuming a step-based notion of depth) but maybe you want to be able to explore breadth first or some kind of beam search. I'm not going to block things over this but it seems like any of the more advanced heuristics, even depth-first ones, will want to control both of the locations you describe, so having a flag that fixes the behaviour at one seems a bit limiting. |
The operators &= and |= can now be used between two safety_checkert::resultt objects. They return the "best" or "worst" of two results, and additionally assign that result to the LHS. The assignment operators are useful when repeatedly performing a safety check and wanting to return a single error code that either describes the best or worst outcome of all the safety checks combined.
The CBMC and JBMC frontends included identical code for running BMC after the language-specific preprocessing. This commit moves that common code into a static method of bmct. Parse options and help text related to bounded model checking are defined in bmc.h.
|
@karkhaz Just state whether you'd like to review the code and make changes right away in light of @martin-cs' comments. @martin-cs & @karkhaz: I'd like to merge this relatively soon to have a first implementation in place for everyone to use and then iterate to improve it. |
|
@tautschnig sorry, I pushed right before I saw your comment. @martin-cs I've removed 6/8, and I'll add a more general flag once I have my heuristics ready and it's more clear what the implementation will look like. Does that work? |
|
@tautschnig I think my question about what's happened to the output method is probably more of a blocker than the 6/8 question. |
|
@karkhaz Thank you that is most considerate of you. |
This commit introduces the --paths flag to CBMC, which makes CBMC model-check each individual path of the program rather than merging paths and model-checking the entire program. The overall strategy involves allowing the "state" of symbolic execution---i.e., a pair of goto_symex_statet and symex_target_equationt ---to be saved during an execution, and for an execution to be resumed from a saved pair. By saving the state at every branch point and disabling path merging, symbolic execution only runs along one path but populates a worklist of saved states that should be executed later. At the top level, CBMC or JBMC loops while the worklist of saved states is non-empty, creating a new bmct object to resume executing each saved path. This commit includes the following supporting changes: - goto_symex_statet now owns a symbol table, separate from the one supporting the goto_program, which is used for dynamically created objects during symbolic execution. goto_symex was previously using a symbol table that was passed to it by reference for this purpose, but that symbol table is needed when resuming symbolic execution from a saved point and so ought properly to be part of the symbolic execution state. - goto_symex_statet now has a pointer to a symex_target_equationt, which can be updated with an equation from a previously-saved symbolic execution run. While equations are also conceptually part of the state of symbolic execution, they are heavily used after symbolic execution has completed (and the symex state has been deallocated) and so the equation is not owned by the state. An explicit copy constructor has been added to goto_symex_statet that initializes the equation member, so that symbolic execution can proceed either with an empty equation or with an equation that was earlier saved. - goto_symex_statet no longer has a pointer to a dirtyt, as this was hindering it from being copied.
cpplint now complains if a derived-class declaration is formatted as class derived: base rather than complaining if it is formatted as class derived : base (with a space on both sides of the colon). The latter style is the one enforced by clang-format, meaning that the two linters were inconsistent.
|
Just pushed: removed the formatting fix, and reinstated the |
|
Also, going to sleep now, I have a flight tomorrow that I need to wake up for...should have time to make any additional changes tomorrow early morning if needed. |
|
@karkhaz Thanks a lot for all those updates! I'm inclined to merge once CI passes, unless @martin-cs tells me otherwise? |
|
I don't understand the question, sorry: do you mean hard-coded?
Yes; sorry.
If so, what would be the alternative, is there some kind of symbolic constant for this?
Not yet but have a look at util/cprover_prefix.h for the current
direction of travel of this sort of thing.
This one isn't urgent or blocking as it's not your change.
|
martin-cs
left a comment
There was a problem hiding this comment.
@tautschnig Nah; I think we're good here.
|
CI has passed successfully, just need @martin-cs to give the green light. |
|
Thanks so much everyone for the detailed reviews and suggestions! Looking forward to continuing work on this. |
| message_handlert &_message_handler, | ||
| prop_convt &_prop_conv) | ||
| prop_convt &_prop_conv, | ||
| goto_symext::branch_worklistt &_branch_worklist) |
There was a problem hiding this comment.
For more backwards compatibility, would it have been possible to provide a constructor that initialises branch_worklist to a default value here?
There was a problem hiding this comment.
I'm not sure default values and references would be compatible?
|
@karkhaz just coming to rebase my own work on top of this; a couple of questions:
|
|
@smowton: not sure if I have a good answer for these questions, I'm not actually that familiar with many of CBMC's command line options.
|
|
Re: #1, it would potentially be useful if you wanted to see the equation before each run, which is indeed what would currently happen, but in such a case we should somehow prepend each report with a header identifying a path. Re #2, the three options 'cover', 'fault-localization' and the default all-properties mode all build a list of all the assertions they're trying to satisfy, then make repeated solver runs to try to satisfy different goals. I guess with --paths we should probably stop trying to target a particular goal once it has been found using a particular path? So the goal list would be maintained outside the twin iterations over goals and paths, and goals incrementally reported either hit or covered as we go, before finally reporting ones we couldn't reach? Or am I getting the wrong end of the stick, and it's actually interesting to check if I can violate goal A via both paths X and Y? |
|
( Unsolicited opinions on what the options should do) --paths --show-vcc prints each formula instead of running the solver |
CBMC can now model-check each path of a program one at a time, displaying the verification result for that path before continuing to symbolically execute other paths. This functionality is similar to what the
symextool provides, but uses thegoto-symexcore that is already used by CBMC rather than having a disjoint symbolic execution codebase. This functionality is enabled by passing the--pathsflag to CBMC or JBMC.When
--pathsis passed, CBMC does not perform path merging when it encounters a program join point. Instead, whenever it encounters a conditionalgotoinstruction, it saves the state of the symbolic executor onto a workqueue and executes one of the branches. Once symbolic execution has finished and the verification result printed out, CBMC pops the workqueue and resumes executing the untaken branch.This is all integrated with the existing
bmcandgoto_symextclasses, with no change in CBMC's original model-checking functionality. Without the--pathsflag, CBMC's full-program model-checking behaviour is an edge-case of path exploration, whereby no goto branch-points are saved and so the entire program is only symbolically executed once.The main patch in this patchset is
[path explore 5/6] Support path-based exploration. Other commits are fairly small and introduce features that are necessary for path exploration, but which are also desirable changes by themselves.