Skip to content

Variable hashes have too many collisions #1594

@arkocal

Description

@arkocal

Summary: With about 60k variables encountered, we have about 1.04 entries per hash. With an ideal hash function the chances for even a single collision would have been very low.

The experiment can be repeated using the topdown solver on arkocal/hash_collision_demo branch, which includes a minimal module for counting how often each hash value appears.

In particular, variables that differ by a single string are mapped to the same hash.

The following changes already significantly reduce collisions (1.0008 keys per hash, no obvious similarities between colliding keys):
In cdomains/stringDomain.ml (https://github.com/goblint/analyzer/blob/master/src/cdomain/value/cdomains/stringDomain.ml) :

let hash x =
  if get_string_domain () = Disjoint then
    hash x
  else
    (* 13859 <- old value*)
    hash x (* new value *)

In cdomains/intDomain.ml, add to line 3680, after let to_yojson ...

let hash = Hashtbl.hash

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions