log: Fix threading races#396
log: Fix threading races#396chrissie-c merged 1 commit intoClusterLabs:masterfrom chrissie-c:logthread-fix
Conversation
|
Note on the reproducibility of this. Currently I can only reproduce it on FreeBSD-devel by running the log.test repeatedly. Roughly 1 in 100 (but it can take longer) will fail in test_log_threads. Two threads both in qb_array_index() but at the different places in qb_log_dcs_get(). |
It's possible that cs->filename or cs->format could be read in the 'fast' path while the 'slow' path is still constructing the object. So we need to lock arr_next_lock before copying them out for the caller. Also wthread_should_exit was unprotected.
|
Update (the previous update was from last week but I posted it in the wrong PR). The bug seems to be the possibility of using a 'cs' object that was not fully constructed. I've moved the lock line down one because the problem isn't actually in qb_array_index() but below that where we copy the values into safe_filename and safe_format. cs might be still being constructed in _log_dcs_new_cs() (which is protected by the lock). |
jfriesse
left a comment
There was a problem hiding this comment.
ACK, not even sure how it is possible that code was working without locking.
|
Just good luck I suspect! Thanks for the review. |
It seems we can't avoiding locking in qb_log_dcs_get because
qb_array_get() isn't thread-safe.
Alternatives to this would be to add a - new - implementation of
the arrays or to stick the lock higher up (as in this patch).
We're logging here, so doing I/O, plus all the formatting
that goes on, I think the locking is a minimal intrusion on the CPU
relatively speaking.
This bug was seen in the test suite on BSD so it's not just theoretical
also wthread_should_exit was unprotected.