Refactor and fix race access distribution#1084
Conversation
Currently the order of messages may depend on varinfo IDs in hashtables or whatnot. Since these differ on Linux and OSX, so does the message order.
| module Memo = | ||
| struct | ||
| include Printable.StdLeaf | ||
| type t = [`Var of CilType.Varinfo.t | `Type of CilType.Typ.t] * Offset.Unit.t [@@deriving eq, ord, hash] |
There was a problem hiding this comment.
Why is this a polymorphic variant type?
There was a problem hiding this comment.
This could be changed, but I don't want to do it here right now since #1089 and more follow-up already build on this, so it's best to avoid massive conflicts there.
Once all the big usability improvements to the race analysis are done, we'll see which types remain at all.
michael-schwarz
left a comment
There was a problem hiding this comment.
Definitely looks a lot more reasonable now. I am a bit unhappy with the number of TODO comments we're adding here though! I don't think we'll ever be in a better situation to address these than now.
None of the added TODOs should be new problems, they're existing (possible) problems, so it's better to mark them than completely ignore them and I cannot fix all of them in a single PR. |
Co-authored-by: Michael Schwarz <michael.schwarz93@gmail.com>
CHANGES: * Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019). * Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016). * Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082). * Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073). * Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048). * Add affine equalities analysis (goblint/analyzer#592). * Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114). * Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979). * Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952). * Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124). * Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777). * Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031). * Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078). * Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091). * Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027). * Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053). * Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138). * Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
This is the result of reverse-engineering the existing race access distribution code. That code is cleaned up, simplified and commented.
This does not yet do anything clever to avoid distributing accesses to many memory locations, but makes the rules for doing so a lot clearer, hopefully simplifying the redesign.
However, the number of accesses, memory locations and races may change as the result of this PR, in particular due to some included soundness fixes.
Changes
Define a new
Access.Memotype for precisely representing memory locations as used by the race analysis.The previous representation used two
options, of which not all four combinations were possible, and both of them could contain offsets, which were duplicated.The new representation has a
varinfoortypas root, followed by unit-indexed offsets. This also allows reusing features from Refactor offsets, lvals and addresses #1067.Completely rewrite
Access.add_propagateasadd_distribute_outer.Its previous logic was very confusing and completely unexplained. In some ways it had an overly specific special case, while the general case was both too precise or too imprecise, depending on the situation. The new logic generalizes this to two general principles according to which type-based accesses are distributed:
The new logic fixes both soundness and precision issues that existed previously.
Fix unsoundness regarding distribution to struct fields with array type.
Previously accesses via
int*didn't account for the possibility that it might point to and thus race with an element of a struct field with typeint[2]. Now arbitrarily deep multi-arrays are unwrapped.Rename and flip
ana.mutex.disjoint_typestoana.race.direct-arithmetic.This should somewhat clarify the meaning of the option and avoid double negation (
not !unsound). Disjoint types is rather nondescriptive since it really just controls the behavior of direct (i.e. non-field) accesses to arithmetic-type–based memory locations.Moves the handling of
ana.race.direct-arithmeticto just one place.Previously two partial checks were in two different places, obfuscating the logic.
Exclude accesses to some anonymous pthread type internals, which show up as
__anonstructand__anonunion.Existing exclusions are based on
TNamed, so simplytypedefnames. But anonymous structs are precisely the ones that don't have such names.Remove polymorphic
Hashtblusage fromAccess.Although it didn't cause issues as the hashtables were with
typsigkeys, it's still nicer to avoid the possibility explicitly.typsigs are still used to make sure that all type-based accesses are collected regardless of anytypedefs.Add
accesstracing.Add option
warn.deterministic.This makes the order of printed messages deterministic, by delaying the printing and later printing them in sorted (by text, etc) order. This is useful for writing cram tests such that the test doesn't spuriously break (e.g. on OSX) due to different
varinfoIDs being in different order in some hashtables that get iterated.TODO