Skip to content

Write heartbeat thread output to safe crash log#153

Merged
kpamnany merged 3 commits intov1.10.2+RAIfrom
kp-hb-safe-log
May 27, 2024
Merged

Write heartbeat thread output to safe crash log#153
kpamnany merged 3 commits intov1.10.2+RAIfrom
kp-hb-safe-log

Conversation

@kpamnany
Copy link
Copy Markdown

@kpamnany kpamnany commented May 24, 2024

PR Description

#140 added the --safe-crash-log-file command line argument. When this is specified, and when in signal handler context, jl_safe_printf additionally writes whatever it is printing to the file in JSON format.

With this PR, jl_safe_printf writes to the file in JSON format from the heartbeat thread's context also.

This PR also refactors the JSON printing function to use a single write call rather than multiple, to reduce the possibility of output from multiple threads being interleaved.

Checklist

Requirements for merging:

kpamnany added 2 commits May 24, 2024 12:11
To allow `jl_safe_printf()` to determine whether it's being called
from the heartbeat thread.
In `jl_safe_printf()`, we're already writing to the safe crash log
if we're in signal handler context (in addition to writing to
`stderr`). Now we do the same if we're in the heartbeat thread.
@github-actions github-actions bot added port-to-v1.10 port-to-v1.12 This change should apply to Julia v1.12 builds labels May 24, 2024
@kpamnany kpamnany requested review from Drvi, NHDaly and d-netto May 24, 2024 18:43
Copy link
Copy Markdown
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Lgtm except for some buffer math questions. Would be good to print what we can if possible, and best if we could print it all!

Concurrent `write()` calls to the same file descriptor should be
thread-safe, but can result in interleaving. Refactor the JSON
printing code used from `jl_safe_printf()` to assemble the message
into a buffer and use a single `write()` call.
@kpamnany
Copy link
Copy Markdown
Author

Improved the "JSONification" implementation based on @NHDaly's comments.

My first stab at this was to use alloca for a sufficiently large buffer, but that didn't work well. I then moved to using a large but fixed-size buffer and hoping it was big enough -- terrible approach. But I didn't want to add the overhead of a malloc and a free for every call, so I'm now following Nathan's suggestion of truncating the message if it is too big for the fixed-size buffer. I've also updated the buffer math to accommodate the JSON preamble, etc.

@kpamnany
Copy link
Copy Markdown
Author

I've tested this locally, starting Julia with --safe-crash-log-file:

  • Sending a signal: the output is correctly duplicated into the safe crash log file as before.
  • Starting the heartbeat mechanism and forcing it to timeout: the heartbeat loss messages and the task backtraces are correctly duplicated into the safe crash log file.

@kpamnany kpamnany merged commit 6479c07 into v1.10.2+RAI May 27, 2024
@kpamnany kpamnany deleted the kp-hb-safe-log branch May 27, 2024 18:12
@kpamnany kpamnany restored the kp-hb-safe-log branch May 27, 2024 18:13
@kpamnany kpamnany deleted the kp-hb-safe-log branch May 27, 2024 19:39
@NHDaly
Copy link
Copy Markdown
Member

NHDaly commented May 28, 2024

I chose your first suggestion -- to truncate if the message is too large -- mainly because I'm pretty sure that it will almost never happen because of the way jl_safe_printf is used.

Ohhhhhh too bad - So we're still going to get one json message per line in the safe printf in the julia task stacktraces? I was incorrectly believing this PR would also be combining the whole report into one json message, which would be nicer to consume, but I guess would be annoying to implement.

So that will have to wait for a future PR, yeah?

@kpamnany
Copy link
Copy Markdown
Author

So we're still going to get one json message per line in the safe printf in the julia task stacktraces? I was incorrectly believing this PR would also be combining the whole report into one json message, which would be nicer to consume, but I guess would be annoying to implement.

Yes, it's one JSON message per line. To do what you're thinking would require changing how jl_safe_printf is used in the stack trace collection code and I think that would be a lot more complicated.

@d-netto d-netto removed the port-to-v1.12 This change should apply to Julia v1.12 builds label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants