Fix some memory bugs in runtime_events_consumer.c#13091
Fix some memory bugs in runtime_events_consumer.c#13091eutro wants to merge 4 commits intoocaml:trunkfrom
runtime_events_consumer.c#13091Conversation
Test cases for some recent bugs in lib-runtime-events
|
It looks like, perhaps unsurprisingly, not all platforms work with the new test (though MSVC just had a quite disappointing network error), so I intend to remove it later, since I can't think of a better way to force failure. |
|
|
||
| struct caml_runtime_events_cursor *cursor = | ||
| caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor)); | ||
| int should_free_events_loc; |
There was a problem hiding this comment.
I'd rather see that variable initialized to zero at the declaration point, so that future changes in this routine do not risk using an uninitialized value.
There was a problem hiding this comment.
Following in that style should int ret; get initialised to something since it is being returned at the end of the function?
There was a problem hiding this comment.
I made ret uninitialised specifically so the compiler would emit a warning if it isn't assigned before jumping; and should_free_events_loc uninitialised because it's to be initialised where runtime_events_loc is.
A change that would accidentally use these with the default initialisation is already wrong, so my reasoning is that it's better to have an obviously invalid value that the compiler would yell at us about.
There was a problem hiding this comment.
This is relying upon the C compiler being always correct in figuring out whether a variable is initialized or not. It turns out they aren't always, with both false positives and false negatives. I wouldn't put too much trust in the compiler.
There was a problem hiding this comment.
Drive-by comment: do we need a boolean to tell us to free runtime_events_loc, or could we just free exactly when the variable is not NULL?
There was a problem hiding this comment.
Drive-by comment: do we need a boolean to tell us to free runtime_events_loc, or could we just free exactly when the variable is not NULL?
It's not the NULL case we're worried about, but the case where it's initialised to caml_runtime_events_current_location(), which isn't a copy.
There was a problem hiding this comment.
I see. I looked out of curiosity, wondering if just making a copy in that case would make things nicer. My quick skim suggests that the function is sprawling and somewhat of a mess, and I think that this is going to bite us again in the future. So I would encourage you to refactor it as you see fit -- for example maybe the system-dependent bits could be factored out into their own helper functions -- to make the code nicer, and not just safer. Your current change is okay but it does not make the code nicer to read.
There was a problem hiding this comment.
I've refactored caml_runtime_events_create_cursor and made runtime_events_loc be a new string every time to make reasoning easier.
| Runtime_events.pause () | ||
|
|
||
| (* workaround for finding the events file even on Windows, where | ||
| [Unix.getpid] doesn't match the one used to open the file *) |
There was a problem hiding this comment.
Can you explain what the difference is on Windows? The header file says:
[pid] is the process id (or equivalent) of the startup OCaml process.
If there is a known difference it would be useful to state that in the header file.
There was a problem hiding this comment.
The issue is not with Windows or Runtime_events, but with the way the Unix library handles PIDs on Windows (as Windows HANDLEs casted to int). Unix.getpid() doesn't actually return the PID of the process (see #4034), whereas the ring buffer file does use the actual PID. I mention possible fixes in this footnote on #13089.
- Introduce `format_runtime_events_loc`, allocating a new string each time - Introduce `cursor_map_ring_file`, which now also closes `ring_fd`, and closes handles on Windows on failure - Use `memset` to zero-initialise cursor callbacks
|
I've spotted and fixed another bug, notably On trunk this crashes after opening too many file descriptors, causing let () =
Runtime_events.start ();
try
for _ = 1 to 1024 (* or whatever your [ulimit -n] is *) do
Runtime_events.(create_cursor None |> free_cursor)
done
with _ ->
Runtime_events.(create_cursor None |> free_cursor)I also perform cleanup on the Windows handles if the mapping fails. |
|
I'll review this. |
|
Should the ring file be also marked non-inheritable on Windows and close-on-exec on Unix? |
NickBarnes
left a comment
There was a problem hiding this comment.
Clearly an improvement on the previous code. A few stylistic quibbles.
| struct caml_runtime_events_cursor *cursor = | ||
| caml_stat_alloc_noexc(sizeof(struct caml_runtime_events_cursor)); | ||
| /** Return a new string with the path of the ring file */ | ||
| static runtime_events_error format_runtime_events_loc( |
There was a problem hiding this comment.
Note that (because C is antediluvian) enumeration constants such as E_PATH_FAILURE are of type int, so here you are forcing an implicit conversion into the enum type, and then at the call site there's an implicit conversion back to int. So maybe this return type should be int? For this reason, I never name enum types (so I never have variables or slots of enum types).
| } | ||
| failed2: | ||
| CloseHandle(cursor->ring_handle); | ||
| failed1: |
There was a problem hiding this comment.
I suggest using fail labels which reflect the failure, e.g. fail_file_mapping and fail_map_view here.
| if (cursor->ring_file_handle == INVALID_HANDLE_VALUE) { | ||
| caml_stat_free(cursor); | ||
| caml_stat_free(runtime_events_loc); | ||
| return E_OPEN_FAILURE; |
There was a problem hiding this comment.
For consistency, please set ret and then goto fail_create_file here (putting that label right before the return ret).
| char_os *runtime_events_loc) { | ||
| int ret = 0; | ||
| #ifdef _WIN32 | ||
| cursor->ring_file_handle = CreateFile( |
There was a problem hiding this comment.
Strong preference for not setting any fields in cursor until we know we're succeeding. Use local variables and then copy them in right before return E_SUCCESS.
| failed1: | ||
| CloseHandle(cursor->ring_file_handle); | ||
| return ret; | ||
| #else |
There was a problem hiding this comment.
The stylistic distinction between the Windows and non-Windows sides of this function is quite jarring.
| } | ||
|
|
||
| ret = E_SUCCESS; | ||
| /* fallthrough */ |
There was a problem hiding this comment.
There is no fall-through.
| *cursor_res = cursor; | ||
| ret = E_SUCCESS; | ||
| /* fallthrough */ | ||
| failed2: |
There was a problem hiding this comment.
Same remarks as for previous functions:
- send all failure cases to the failure section;
- use distinct failure labels for each failure case (even if there is no undo action between a pair of failure labels);
- name each failure label according to the action which has failed;
- only initialize local variables until you know you have succeeded.
| if (ret != E_SUCCESS) goto failed2; | ||
|
|
||
| cursor->current_positions = | ||
| caml_stat_alloc(cursor->metadata->max_domains * sizeof(uint64_t)); |
There was a problem hiding this comment.
If this allocation fails, you get an exception and all this careful failure-path code is stymied. Use caml_stat_alloc_noexc and then handle the failure case in the same way as all the others.
| failed2: | ||
| caml_stat_free(runtime_events_loc); | ||
| failed1: | ||
| if (ret != E_SUCCESS) { |
There was a problem hiding this comment.
If ret is E_SUCCESS here then surely something has gone badly wrong?
|
|
||
| *cursor_res = cursor; | ||
| ret = E_SUCCESS; | ||
| /* fallthrough */ |
|
I believe that this PR is waiting for someone, presumably @eutro if available, to take into account the feedback of @NickBarnes. This should be good to merge afterwards. |
|
In @eutro's absence, I have rebased this here: NickBarnes@e990bed and will make a further commit on that branch to reflect my review comments here. |
|
This PR was adopted by @NickBarnes in #13419 and merged. Thanks @eutro (and everyone) for the nice work! |
This PR fixes a few memory bugs surrounding
runtime_events_locinruntime_events_consumer.c, as mentioned in #13089, specifically incaml_runtime_events_create_cursor.These bugs are:
Runtime_events.create_cursor None(thepid < 0branch in the C code) allocatesruntime_events_locbut reassigns it, making the old pointer unreachable without deallocating it:ocaml/otherlibs/runtime_events/runtime_events_consumer.c
Lines 108 to 124 in 4c6a384
runtime_events_locafter it is introduced if and only if the function returns with an error.1 This is incorrect, both because:pid < 0code path, it shouldn't be freed at all, sincecaml_runtime_events_current_location()does not return a new stringpid >= 0code path, it should always be freed, sinceruntime_events_locdoes not escape the functionThis PR:
runtime_events_loc's allocation into the relevant branch, rewriting the function to be single-exit after cursor allocation succeeds, and by having both frees state their explicit guards:ocaml/otherlibs/runtime_events/runtime_events_consumer.c
Lines 237 to 252 in 1c337f6
path-freeing if-blocks in thecaml_ml_runtime_events_create_cursorwrapper which calls it, for clarity. This does not change the behaviour of the program.chmod 000s the ring buffer file in order to cause subsequent read attempts to fail..eventsfile, so it works on Windows (rather than skipping), because I wanted to actually test #13089.Footnotes
This is the correct behaviour for
cursor, since the caller takes ownership of the returned cursor iff the function returns successfully ↩