Skip to content

Extract base analysis flag to separate analysis, refactor thread spawning#130

Merged
sim642 merged 50 commits intomasterfrom
thread/flag
Nov 27, 2020
Merged

Extract base analysis flag to separate analysis, refactor thread spawning#130
sim642 merged 50 commits intomasterfrom
thread/flag

Conversation

@sim642
Copy link
Copy Markdown
Member

@sim642 sim642 commented Nov 10, 2020

This is my ongoing effort to extract the flag (containing a "boolean" of single-/multithreaded mode and current thread ID) from base analysis to a separate analysis.

Additionally, I'm trying to refactor thread spawning, which multiple analyses currently try to do in parallel in special in a very similar way. The idea is to add additional transfer functions (currently threadenter and threadspawn) which make other analyses aware of a thread spawned by base analysis without needing to duplicate the logic.

This should make it easier to handle thread joins (pthread_join) and in simpler cases go back to singlethreaded mode, avoiding races that come from the end of main thread after other threads have been joined.

@sim642 sim642 added the cleanup Refactoring, clean-up label Nov 10, 2020
@michael-schwarz
Copy link
Copy Markdown
Member

It is definitely a lot nicer to have these things separate now, good job!

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Nov 18, 2020

I think separately from this PR, we should probably at some point cleanup the regression tests that specify which specific analyses to run explicitly to:

* use the default ones, and add any additionally needed ones

* and only specify exactly which ones to run where necessary

This will make future diffs smaller.

Absolutely! I was thinking the same when i had to fix the merge conflicts in 100 regression test PARAMs.

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Nov 18, 2020

The idea is to add additional transfer functions (currently threadenter and threadspawn) which make other analyses aware of a thread spawned by base analysis without needing to duplicate the logic.

Could you elaborate a bit on when these transfer functions would be called specifically?
And also how does this relate to the otherstate which was somehow related to this if I see correctly?

The new threadenter is basically the old otherstate on steroids: it gets much more information as arguments than just the function varinfo, notably the access to ctx.local and ctx.ask. As before, this transfer function is used when a new thread's initial state is to be calculated after ctx.spawn has been done somewhere.

The new threadspawn is a bit like combine (I originally had it named threadcombine but it was misleading) but it combines the new thread's initial state (after all analyses have done their threadenter to get that) with the current thread's state. This has two main goals:

  1. The thread analysis is made aware of a new thread that was spawned from the current thread, thus it can add that to its local created threads set. Previously the thread analysis had to duplicate and match base analysis logic to know when that happens. Now only base calls ctx.spawn and via these transfer functions everyone else is aware.
  2. In Handle thread joins #137 the base analysis can ask for the created thread ID (which is calculated by threadid analysis' threadenter) in order to set the local pthread_t variable to point to that.

The biggest peculiarity with threadspawn (or multiple) is that it happens in parallel with a normal transfer function and those are joined, like how ctx.split works. This is because if base's special does a ctx.spawn, then thread's special can't be made aware of the fact that something spawned (unless we use ctx.postsub I guess, but that's so ugly). Also, since in some kernel cases base's assign may do ctx.spawn, this parallel threadspawn is nicely compatible with threads being spawned in every weird place or condition.

If only we had somewhere to nicely document this sort of things...

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Nov 26, 2020

Here's a screenshot of this PRs differences on SoftwareSystems:
Screenshot_2020-11-26 results 2020-11-26_15-44-36 differences – BenchExec results

Notably ldv-consumption/32_7a_cilled_linux-3.8-rc1-32_7a-sound--core--snd-rawmidi.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.i turns from true to unknown which worries me.

@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Nov 27, 2020

ldv-consumption/32_7a_cilled_linux-3.8-rc1-32_7a-sound--core--snd-rawmidi.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.i

Something quite funny is happening that this turns unknown. Apparently we end up in multithreaded mode which obviously makes us less precise. It's a SoftwareSystems program and doesn't use any threading though, so what gives? Apparently we're spawning a function through a function pointer given to an special unknown function (in this case the standard free). The program isn't doing something stupid like freeing a function pointer though but instead freeing a struct which happens to contain function pointers. The unknown function spawning logic uses collect_funargs to find all reachable function pointers from that and that's how snd_rawmidi_input_event_work ends up being spawned by free.

Why didn't this happen before? Probably because thanks to mallocWrapper the freed blob is more precise (a struct containing a function pointer, not top).

Why didn't this happen immediately when mallocWrapper was added but just now with thread flag changes? I didn't bother looking into it that deep but my guess is that the thread spawning logic, which previously was duplicated in multiple analyses, had weaker implementations in some (not using reachability for unknown specials). Now that this happens centrally in base where the logic is the most powerful, it now arises.

Obviously this is silly behavior, so I see two ways to avoid this from happening:

  1. Add a new config option to control whether unknowns specials spawn. This can be enabled by default, keeping our current logic, but disabled for SV-COMP because there programs should be complete and not call unknown functions which may spawn threads.
    EDIT: Turns out our defaults already define the option exp.unknown_funs_spawn but it's not used anywhere... That's an easy fix!
  2. Make our LibraryFunctions invalidate actions mechanism more fine-grained. Right now free is specified as writesAll because it mutates the pointed memory but that doesn't mean it may spawn threads from it. Even the unknown specials spawning logic has a concerned comment:
     (* why do we only spawn arguments that are written?? *)
    We'd probably be better off having a another invalidation action to specify which arguments of which special functions may spawn.

It already existed but was completely unused.
@sim642
Copy link
Copy Markdown
Member Author

sim642 commented Nov 27, 2020

The pull_request CI checks now seem to be failing because the added regression test 29/21 in master doesn't enable threadid and threadflag analyses, which would be a simple enough fix. Actually adding threadid is enough to fix it.

But it's weird because base analysis doesn't depend on threadid at all. The crash is

Fatal error: exception Failure("BUG: Empty set of start variables; may happen if enter_func of any analysis returns an empty list.")

What makes is weird that adding analyses fixes the start variables, not removing problematic analyses.

…sound--core--snd-rawmidi.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.i
Now without threadid enabled, morphstate gets called with bot, that reduce removes. This is unintentional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Refactoring, clean-up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants