-
Notifications
You must be signed in to change notification settings - Fork 87
Redesign of LibraryFunctions specification #719
Description
Recently during interactive benchmarking, it has really bothered me how messy and inflexible our LibraryFunctions specifications are
Problems
- There are two completely separate specification mechanisms in
LibraryFunctions:classifyis for classifying some functions into particular categories for uniform handling by analyses:analyzer/src/analyses/libraryFunctions.ml
Lines 8 to 98 in fb54fa5
type categories = [ | `Malloc of exp | `Calloc of exp * exp | `Realloc of exp * exp | `Assert of exp | `Lock of bool * bool * bool (* try? * write? * return on success *) | `Unlock | `ThreadCreate of exp * exp * exp (* id * f * x *) | `ThreadJoin of exp * exp (* id * ret_var *) | `Unknown of string ] let osek_renames = ref false let classify' fn exps = let strange_arguments () = M.warn "%s arguments are strange!" fn; `Unknown fn in match fn with | "pthread_create" -> begin match exps with | [id;_;fn;x] -> `ThreadCreate (id, fn, x) | _ -> strange_arguments () end | "pthread_join" -> begin match exps with | [id; ret_var] -> `ThreadJoin (id, ret_var) | _ -> strange_arguments () end | "malloc" | "kmalloc" | "__kmalloc" | "usb_alloc_urb" | "__builtin_alloca" -> begin match exps with | size::_ -> `Malloc size | _ -> strange_arguments () end | "ZSTD_customMalloc" -> (* only used with extraspecials *) begin match exps with | size::_ -> `Malloc size | _ -> strange_arguments () end | "kzalloc" -> begin match exps with | size::_ -> `Calloc (Cil.one, size) | _ -> strange_arguments () end | "calloc" -> begin match exps with | n::size::_ -> `Calloc (n, size) | _ -> strange_arguments () end | "ZSTD_customCalloc" -> (* only used with extraspecials *) begin match exps with | size::_ -> `Calloc (Cil.one, size) | _ -> strange_arguments () end | "realloc" -> begin match exps with | p::size::_ -> `Realloc (p, size) | _ -> strange_arguments () end | "assert" -> begin match exps with | [e] -> `Assert e | _ -> M.warn "Assert argument mismatch!"; `Unknown fn end | "_spin_trylock" | "spin_trylock" | "mutex_trylock" | "_spin_trylock_irqsave" | "down_trylock" -> `Lock(true, true, true) | "pthread_mutex_trylock" | "pthread_rwlock_trywrlock" -> `Lock (true, true, false) | "GetSpinlock" -> `Lock (false, true, true) | "ReleaseSpinlock" -> `Unlock | "LAP_Se_WaitSemaphore" (* TODO: only handle those when arinc analysis is enabled? *) | "_spin_lock" | "_spin_lock_irqsave" | "_spin_lock_bh" | "down_write" | "mutex_lock" | "mutex_lock_interruptible" | "_write_lock" | "_raw_write_lock" | "pthread_rwlock_wrlock" | "GetResource" | "_raw_spin_lock" | "_raw_spin_lock_flags" | "_raw_spin_lock_irqsave" | "_raw_spin_lock_irq" | "_raw_spin_lock_bh" | "spin_lock_irqsave" | "spin_lock" -> `Lock (get_bool "sem.lock.fail", true, true) | "pthread_mutex_lock" | "__pthread_mutex_lock" -> `Lock (get_bool "sem.lock.fail", true, false) | "pthread_rwlock_tryrdlock" | "pthread_rwlock_rdlock" | "_read_lock" | "_raw_read_lock" | "down_read" -> `Lock (get_bool "sem.lock.fail", false, true) | "LAP_Se_SignalSemaphore" | "__raw_read_unlock" | "__raw_write_unlock" | "raw_spin_unlock" | "_spin_unlock" | "spin_unlock" | "_spin_unlock_irqrestore" | "_spin_unlock_bh" | "_raw_spin_unlock_bh" | "mutex_unlock" | "ReleaseResource" | "_write_unlock" | "_read_unlock" | "_raw_spin_unlock_irqrestore" | "pthread_mutex_unlock" | "__pthread_mutex_unlock" | "spin_unlock_irqrestore" | "up_read" | "up_write" | "up" -> `Unlock | x -> `Unknown x invalidate_actionsis for specifying what base invalidation and race detection do with arguments:analyzer/src/analyses/libraryFunctions.ml
Lines 192 to 518 in fb54fa5
let invalidate_actions = [ "GetResource", readsAll; "ReleaseResource", readsAll; "GetSpinlock", readsAll; "ReleaseSpinlock", readsAll; "atoi", readsAll; (*safe*) "__builtin_ctz", readsAll; "__builtin_ctzl", readsAll; "__builtin_ctzll", readsAll; "__builtin_clz", readsAll; "bzero", writes [1]; (*keep 1*) "__builtin_bzero", writes [1]; (*keep [1]*) "explicit_bzero", writes [1]; "__explicit_bzero_chk", writes [1]; "connect", readsAll; (*safe*) "fclose", readsAll; (*safe*) "fflush", writesAll; (*unsafe*) "fopen", readsAll; (*safe*) "fdopen", readsAll; (*safe*) "setvbuf", writes[1;2]; (* TODO: if this is used to set an input buffer, the buffer (second argument) would need to remain TOP, *) (* as any future write (or flush) of the stream could result in a write to the buffer *) "fprintf", writes [1]; (*keep [1]*) "__fprintf_chk", writes [1]; (*keep [1]*) "fread", writes [1;4]; "__fread_alias", writes [1;4]; "__fread_chk", writes [1;4]; "utimensat", readsAll; "free", frees [1]; (*unsafe*) "fwrite", readsAll;(*safe*) "getopt", writes [2];(*keep [2]*) "localtime", readsAll;(*safe*) "memcpy", writes [1];(*keep [1]*) "__builtin_memcpy", writes [1];(*keep [1]*) "mempcpy", writes [1];(*keep [1]*) "__builtin___memcpy_chk", writes [1]; "__builtin___mempcpy_chk", writes [1]; "memset", writes [1];(*unsafe*) "__builtin_memset", writes [1];(*unsafe*) "__builtin___memset_chk", writes [1]; "printf", readsAll;(*safe*) "__printf_chk", readsAll;(*safe*) "printk", readsAll;(*safe*) "perror", readsAll;(*safe*) "pthread_mutex_lock", readsAll;(*safe*) "pthread_mutex_trylock", readsAll; "pthread_mutex_unlock", readsAll;(*safe*) "__pthread_mutex_lock", readsAll;(*safe*) "__pthread_mutex_trylock", readsAll; "__pthread_mutex_unlock", readsAll;(*safe*) "__mutex_init", readsAll;(*safe*) "mutex_init", readsAll;(*safe*) "mutex_lock", readsAll;(*safe*) "mutex_lock_interruptible", readsAll;(*safe*) "mutex_unlock", readsAll;(*safe*) "_spin_lock", readsAll;(*safe*) "_spin_unlock", readsAll;(*safe*) "_spin_lock_irqsave", readsAll;(*safe*) "_spin_unlock_irqrestore", readsAll;(*safe*) "pthread_mutex_init", readsAll;(*safe*) "pthread_mutex_destroy", readsAll;(*safe*) "pthread_mutexattr_settype", readsAll;(*safe*) "pthread_mutexattr_init", readsAll;(*safe*) "pthread_self", readsAll;(*safe*) "read", writes [2];(*keep [2]*) "recv", writes [2];(*keep [2]*) "scanf", writesAllButFirst 1 readsAll;(*drop 1*) "send", readsAll;(*safe*) "snprintf", writes [1];(*keep [1]*) "__builtin___snprintf_chk", writes [1];(*keep [1]*) "sprintf", writes [1];(*keep [1]*) "sscanf", writesAllButFirst 2 readsAll;(*drop 2*) "strcmp", readsAll;(*safe*) "strftime", writes [1];(*keep [1]*) "strlen", readsAll;(*safe*) "strncmp", readsAll;(*safe*) "strncpy", writes [1];(*keep [1]*) "strstr", readsAll;(*safe*) "strdup", readsAll;(*safe*) "toupper", readsAll;(*safe*) "tolower", readsAll;(*safe*) "time", writesAll;(*unsafe*) "vfprintf", writes [1];(*keep [1]*) "__vfprintf_chk", writes [1];(*keep [1]*) "vprintf", readsAll;(*safe*) "vsprintf", writes [1];(*keep [1]*) "write", readsAll;(*safe*) "__builtin_va_arg", readsAll;(*safe*) "__builtin_va_end", readsAll;(*safe*) "__builtin_va_start", readsAll;(*safe*) "__ctype_b_loc", readsAll;(*safe*) "__errno", readsAll;(*safe*) "__errno_location", readsAll;(*safe*) "sigfillset", writesAll; (*unsafe*) "sigprocmask", writesAll; (*unsafe*) "uname", writesAll;(*unsafe*) "__builtin_strcmp", readsAll;(*safe*) "getopt_long", writesAllButFirst 2 readsAll;(*drop 2*) "__strdup", readsAll;(*safe*) "strtoul__extinline", readsAll;(*safe*) "strtol", writes [2]; "geteuid", readsAll;(*safe*) "opendir", readsAll; (*safe*) "readdir_r", writesAll;(*unsafe*) "atoi__extinline", readsAll;(*safe*) "getpid", readsAll;(*safe*) "fgetc", writesAll;(*unsafe*) "getc", writesAll;(*unsafe*) "_IO_getc", writesAll;(*unsafe*) "closedir", writesAll;(*unsafe*) "setrlimit", readsAll;(*safe*) "chdir", readsAll;(*safe*) "pipe", writesAll;(*unsafe*) "close", writesAll;(*unsafe*) "setsid", readsAll;(*safe*) "strerror_r", writesAll;(*unsafe*) "pthread_attr_init", writesAll; (*unsafe*) "pthread_attr_setdetachstate", writesAll;(*unsafe*) "pthread_attr_setstacksize", writesAll;(*unsafe*) "pthread_attr_setscope", writesAll;(*unsafe*) "pthread_cond_init", readsAll; (*safe*) "pthread_cond_wait", readsAll; (*safe*) "pthread_cond_signal", readsAll;(*safe*) "pthread_cond_broadcast", readsAll;(*safe*) "pthread_cond_destroy", readsAll;(*safe*) "__pthread_cond_init", readsAll; (*safe*) "__pthread_cond_wait", readsAll; (*safe*) "__pthread_cond_signal", readsAll;(*safe*) "__pthread_cond_broadcast", readsAll;(*safe*) "__pthread_cond_destroy", readsAll;(*safe*) "pthread_key_create", writesAll;(*unsafe*) "sigemptyset", writesAll;(*unsafe*) "sigaddset", writesAll;(*unsafe*) "pthread_sigmask", writesAllButFirst 2 readsAll;(*unsafe*) "raise", writesAll;(*unsafe*) "_strlen", readsAll;(*safe*) "__builtin_alloca", readsAll;(*safe*) "dlopen", readsAll;(*safe*) "dlsym", readsAll;(*safe*) "dlclose", readsAll;(*safe*) "dlerror", readsAll;(*safe*) "stat__extinline", writesAllButFirst 1 readsAll;(*drop 1*) "lstat__extinline", writesAllButFirst 1 readsAll;(*drop 1*) "__builtin_strchr", readsAll;(*safe*) "strcpy", writes [1];(*keep [1]*) "__builtin___strcpy", writes [1];(*keep [1]*) "__builtin___strcpy_chk", writes [1];(*keep [1]*) "strcat", writes [2];(*keep [2]*) "getpgrp", readsAll;(*safe*) "umount2", readsAll;(*safe*) "memchr", readsAll;(*safe*) "memmove", writes [2;3];(*keep [2;3]*) "__builtin_memmove", writes [2;3];(*keep [2;3]*) "__builtin___memmove_chk", writes [2;3];(*keep [2;3]*) "waitpid", readsAll;(*safe*) "statfs", writes [1;3;4];(*keep [1;3;4]*) "mkdir", readsAll;(*safe*) "mount", readsAll;(*safe*) "open", readsAll;(*safe*) "__open_alias", readsAll;(*safe*) "__open_2", readsAll;(*safe*) "fcntl", readsAll;(*safe*) "ioctl", writesAll;(*unsafe*) "fstat__extinline", writesAll;(*unsafe*) "umount", readsAll;(*safe*) "rmdir", readsAll;(*safe*) "strrchr", readsAll;(*safe*) "scandir", writes [1;3;4];(*keep [1;3;4]*) "unlink", readsAll;(*safe*) "sched_yield", readsAll;(*safe*) "nanosleep", writesAllButFirst 1 readsAll;(*drop 1*) "sigdelset", readsAll;(*safe*) "sigwait", writesAllButFirst 1 readsAll;(*drop 1*) "setlocale", readsAll;(*safe*) "bindtextdomain", readsAll;(*safe*) "textdomain", readsAll;(*safe*) "dcgettext", readsAll;(*safe*) "syscall", writesAllButFirst 1 readsAll;(*drop 1*) "sysconf", readsAll; "fputs", readsAll;(*safe*) "fputc", readsAll;(*safe*) "fseek", writes[1]; "fileno", readsAll; "ferror", readsAll; "ftell", readsAll; "putc", readsAll;(*safe*) "putw", readsAll;(*safe*) "putchar", readsAll;(*safe*) "feof", readsAll;(*safe*) "__getdelim", writes [3];(*keep [3]*) "vsyslog", readsAll;(*safe*) "gethostbyname_r", readsAll;(*safe*) "__h_errno_location", readsAll;(*safe*) "__fxstat", readsAll;(*safe*) "getuid", readsAll;(*safe*) "strerror", readsAll;(*safe*) "readdir", readsAll;(*safe*) "openlog", readsAll;(*safe*) "getdtablesize", readsAll;(*safe*) "umask", readsAll;(*safe*) "socket", readsAll;(*safe*) "clntudp_create", writesAllButFirst 3 readsAll;(*drop 3*) "svctcp_create", readsAll;(*safe*) "clntudp_bufcreate", writesAll;(*unsafe*) "authunix_create_default", readsAll;(*safe*) "writev", readsAll;(*safe*) "clnt_broadcast", writesAll;(*unsafe*) "clnt_sperrno", readsAll;(*safe*) "pmap_unset", writesAll;(*unsafe*) "bind", readsAll;(*safe*) "svcudp_create", readsAll;(*safe*) "svc_register", writesAll;(*unsafe*) "sleep", readsAll;(*safe*) "usleep", readsAll; "svc_run", writesAll;(*unsafe*) "dup", readsAll; (*safe*) "__builtin_expect", readsAll; (*safe*) "vsnprintf", writesAllButFirst 3 readsAll; (*drop 3*) "__builtin___vsnprintf", writesAllButFirst 3 readsAll; (*drop 3*) "__builtin___vsnprintf_chk", writesAllButFirst 3 readsAll; (*drop 3*) "syslog", readsAll; (*safe*) "strcasecmp", readsAll; (*safe*) "strchr", readsAll; (*safe*) "getservbyname", readsAll; (*safe*) "__error", readsAll; (*safe*) "__maskrune", writesAll; (*unsafe*) "inet_addr", readsAll; (*safe*) "gethostbyname", readsAll; (*safe*) "setsockopt", readsAll; (*safe*) "listen", readsAll; (*safe*) "getsockname", writes [1;3]; (*keep [1;3]*) "getenv", readsAll; (*safe*) "execl", readsAll; (*safe*) "select", writes [1;5]; (*keep [1;5]*) "accept", writesAll; (*keep [1]*) "getpeername", writes [1]; (*keep [1]*) "times", writesAll; (*unsafe*) "timespec_get", writes [1]; "fgets", writes [1;3]; (*keep [3]*) "__fgets_alias", writes [1;3]; (*keep [3]*) "__fgets_chk", writes [1;3]; (*keep [3]*) "strtoul", readsAll; (*safe*) "__tolower", readsAll; (*safe*) "signal", writesAll; (*unsafe*) "strsignal", readsAll; "popen", readsAll; (*safe*) "BF_cfb64_encrypt", writes [1;3;4;5]; (*keep [1;3;4,5]*) "BZ2_bzBuffToBuffDecompress", writes [3;4]; (*keep [3;4]*) "uncompress", writes [3;4]; (*keep [3;4]*) "stat", writes [2]; (*keep [1]*) "__xstat", writes [3]; (*keep [1]*) "__lxstat", writes [3]; (*keep [1]*) "remove", readsAll; "BZ2_bzBuffToBuffCompress", writes [3;4]; (*keep [3;4]*) "compress2", writes [3]; (*keep [3]*) "__toupper", readsAll; (*safe*) "BF_set_key", writes [3]; (*keep [3]*) "memcmp", readsAll; (*safe*) "sendto", writes [2;4]; (*keep [2;4]*) "recvfrom", writes [4;5]; (*keep [4;5]*) "srand", readsAll; (*safe*) "rand", readsAll; (*safe*) "gethostname", writesAll; (*unsafe*) "fork", readsAll; (*safe*) "realloc", writesAll;(*unsafe*) "setrlimit", readsAll; (*safe*) "getrlimit", writes [2]; (*keep [2]*) "sem_init", readsAll; (*safe*) "sem_destroy", readsAll; (*safe*) "sem_wait", readsAll; (*safe*) "sem_post", readsAll; (*safe*) "PL_NewHashTable", readsAll; (*safe*) "__assert_fail", readsAll; (*safe*) "assert_failed", readsAll; (*safe*) "htonl", readsAll; (*safe*) "htons", readsAll; (*safe*) "ntohl", readsAll; (*safe*) "htons", readsAll; (*safe*) "munmap", readsAll;(*safe*) "mmap", readsAll;(*safe*) "clock", readsAll; "pthread_rwlock_wrlock", readsAll; "pthread_rwlock_trywrlock", readsAll; "pthread_rwlock_rdlock", readsAll; "pthread_rwlock_tryrdlock", readsAll; "pthread_rwlockattr_destroy", writesAll; "pthread_rwlockattr_init", writesAll; "pthread_rwlock_destroy", readsAll; "pthread_rwlock_init", readsAll; "pthread_rwlock_unlock", readsAll; "__builtin_object_size", readsAll; "__builtin_bswap16", readsAll; "__builtin_bswap32", readsAll; "__builtin_bswap64", readsAll; "__builtin_bswap128", readsAll; "__builtin_va_arg_pack_len", readsAll; "__open_too_many_args", readsAll; "usb_submit_urb", readsAll; (* first argument is written to but according to specification must not be read from anymore *) "dev_driver_string", readsAll; "dev_driver_string", readsAll; "__spin_lock_init", writes [1]; "kmem_cache_create", readsAll; "pthread_create", onlyWrites [0; 2]; (* TODO: onlyWrites/keep is 0-indexed now, WTF? *) "__builtin_prefetch", readsAll; "idr_pre_get", readsAll; "zil_replay", writes [1;2;3;5]; "__VERIFIER_nondet_int", readsAll; (* no args, declare invalidate actions to prevent invalidating globals when extern in regression tests *) (* no args, declare invalidate actions to prevent invalidating globals *) "__VERIFIER_atomic_begin", readsAll; "__VERIFIER_atomic_end", readsAll; (* prevent base from spawning ARINC processes early, handled by arinc/extract_arinc *) (* "LAP_Se_SetPartitionMode", writes [2]; *) "LAP_Se_CreateProcess", writes [2; 3]; "LAP_Se_CreateErrorHandler", writes [2; 3]; "isatty", readsAll; "setpriority", readsAll; "getpriority", readsAll; (* ddverify *) "spin_lock_init", readsAll; "spin_lock", readsAll; "spin_unlock", readsAll; "spin_unlock_irqrestore", readsAll; "spin_lock_irqsave", readsAll; "sema_init", readsAll; "down_trylock", readsAll; "up", readsAll; "ZSTD_customFree", frees [1]; (* only used with extraspecials *) ]
- There is duplication between the two and to avoid confusion there must be duplication. If a function is in
classify, but notinvalidate_actions, then analyses can handle it according to the classification, but base analysis still prints an error about missing function definition and starts invalidating things. - Specifying
classifycases using pattern matching is relatively verbose. Five lines of code are used to match one function and its arguments:analyzer/src/analyses/libraryFunctions.ml
Lines 27 to 31 in fb54fa5
| "pthread_create" -> begin match exps with | [id;_;fn;x] -> `ThreadCreate (id, fn, x) | _ -> strange_arguments () end invalidate_actionsspecifications likereadsAll,writesAll, etc. are too broad to specify behavior sufficiently precisely. For example:Writeis also used to decide, which arguments need to be spawned as threads. This is almost always unnecessary because C standard library functions don't spawn threads (except for C11 threads, which we don't support yet anyway).Freewas added in Add option for ignoring races fromfree#695 to be able to distinguish memory freeing fromWrite.- All accesses are assumed to be transitive/deep, i.e. they follow pointers. Again, this is almost never the case for C standard library functions (e.g.
freedoesn't recursively first free data pointed to inside the outer struct, it only deallocates the direct memory and it's the programmer's job to free inner data structures first). - Shallow accessing is currently enforced by another duplicate match in an analysis (separately):
analyzer/src/analyses/accessAnalysis.ml
Lines 190 to 197 in fb54fa5
(* TODO: per-argument reach *) let reach = match f.vname with | "memset" | "__builtin_memset" | "__builtin___memset_chk" -> false | "bzero" | "__builtin_bzero" | "explicit_bzero" | "__explicit_bzero_chk" -> false | "__builtin_object_size" -> false | _ -> true in - Shallow (direct) vs deep (indirect) accessing cannot be specified per-argument.
- More
invalidate_actionsspecifications use inconsistent argument numbering:- Some use 1-based indexing:
analyzer/src/analyses/libraryFunctions.ml
Line 219 in fb54fa5
"free", frees [1]; (*unsafe*) - Others use 0-based indexing:
analyzer/src/analyses/libraryFunctions.ml
Line 493 in fb54fa5
"pthread_create", onlyWrites [0; 2]; (* TODO: onlyWrites/keep is 0-indexed now, WTF? *)
- Some use 1-based indexing:
- More precise
invalidate_actionsspecifications are unintuitively per access kind, not per argument:analyzer/src/analyses/libraryFunctions.ml
Line 461 in a8fd1c5
"realloc", readsFrees [0; 1] [0]; (* read+free first argument, read second argument *) - Argument numbering is error prone with multiple similar arguments. For example: 7e5ab0a.
- Since
classifyis a pattern match, all of the possible cases must be hard-coded together in one place. So all Linux kernel, etc specific library functions must also be there.
Goals
To address all of the above, a new approach should fulfill the following goals:
- All specifications relating to one function should be together in one place to avoid duplication and need for synchronization.
- Matching the arguments for constructing the
classifyresult shouldn't require any fallback boilerplate per-function. (First-class patterns can achieve this) - Invalidate actions should be specified in the same "pattern" which matches the arguments for classification, essentially "annotating" the corresponding arguments being matched. This avoids any need for specification by argument indices.
- Invalidate actions per-argument should be specified by a set of access kinds.
- A
Spawnaccess kind should be added. - All access kinds should be usable together with a shallow vs deep flag.
- Arguments should be (optionally) nameable to allow distinguishing similar arguments at glance without having to look up the function every time.
- A
- Allow custom attributes per-functions, e.g. for Detect calls of thread-unsafe functions as races #723.