Replace --log-errors-to-tty with --report-errors-to-stderr#3428
Replace --log-errors-to-tty with --report-errors-to-stderr#3428sporksmith merged 7 commits intoshadow:mainfrom
--log-errors-to-tty with --report-errors-to-stderr#3428Conversation
85d0d3d to
d38d8f5
Compare
--log-errors-to-tty with --report-errors-to-stderr
|
cc @robgjansen to join in on the UX bikeshedding ;) |
|
My primary shadow use case is running it in the terminal without any file redirection, so I think the duplicated error messages aren't the greatest, but the new message formatting looks good and I think makes that much better. My main concern is that they'll be interleaved at a non-line boundary. Since stderr isn't buffered (and even the buffered stdout is unlikely to be written in a single write syscall), I don't see why we wouldn't occasionally get things like: And since stdout is buffered and delayed, we might get: which makes it seem like the error occurs much earlier than it did relative to the logs. |
Huh; I almost never do this. If this is a common use-case for others though, another idea is to turn this config option into a ternary of
I'll think about this a bit and see if I can make it less likely with a bit more explicit flushing on error messages.... |
I added a stdout flush before we write to stderr. Also an explicit stderr flush before we write to stdout again, though there was already an implicit one. I think this addresses the mid-line interleaving. I also moved the |
|
OK you summoned me, so here is my take :) My main use case is to always redirect stdout to a log file, because I don't want to lose that information to a terminal buffer. I want that log file to contain all log messages. I think this is how things work now and after this PR. Then, I can see how it would be useful to duplicate errors to stderr, so that those are not missed by users. And I could see how that would be useful even if running in a terminal and logging stdout to a file. IIUC, when running in a terminal, those errors would not currently be printed to stderr, but after this PR they will be. So I think this PR is an improvement. But for Steve's use case, errors injected mid-line into stdout sure would be annoying. Maybe a good situation would be to have two log streams. The user could set the log level independently on each, and could set the destination (file, stderr, stdout) independently on each. With that, you could set one stream as the small firehose: a sort of progress indicator about how the simulation is going (similar to Tor's notice log level), and the other stream as the big firehose: containing all messages. Or you could configure them as you wish, and print them where you wish. That's a bit more work though. Also, I don't think necessarily solves Steve's problem of wanting the failed processes printed out again at the end of the simulation, does it? |
Same
Right
Prior to this PR what we tried to do was duplicate error-level logs to stderr if stdout is not a terminal, but stderr is a terminal. i.e. when stdout is being logged to a file, and stderr wasn't, we write to stderr. But terminal detection is tricky, especially in CI web pseudo-terminal situations. In the Tor CI I think we end up deciding that neither stdout nor stderr is a terminal, so we don't mirror errors to stderr, so we don't get to see the errors in the main job output.
I think I've ensured now that there won't be mid-line interleaving, but yes the merged output might still be a bit ugly/annoying. It can still be explicitly disabled though with
I could see this being potentially useful, but it implies using the full logging format for both outputs, which reintroduces the issue of full-on-duplication in the case where stdout and stderr are merged. (vs. this MR changes the stderr version to be more of a human-readable message than a "log line")
Oh good point, I missed that #3426 wanted both for them to be printed to stderr and repeated in the summary at the end. This PR only addresses the first part of printing them to stderr. I think I'd rather keep this PR to making that happen; if we decide to also print them again at the end that could be a separate PR. |
Oh, I missed this part. So I guess that would probably ensure that we weren't accidentally writing the same log line multiple times to the same destination. But I think it'd be more difficult to have a nice default behavior for the use-case of redirecting stdout to a file but not stderr; the user would have to explicitly configure the stderr logger to get their summary. (Or else we go back to trying to be clever and figure out the right thing dynamically) |
I think the default would be: Stream 1:
Stream 2:
But in my world the log messages are formatted as log lines in both of the above, which is arguably more useful than the "simpler" messages with the ip address, process, filename:line, etc. prefix removed as we do for the stderr messages now. |
|
In fact, you should be able to just add as many log streams as you want, where each has a log level and a destination. This is how c-tor logging works too. |
I'd be happy with something like this, with "stream 1" going to something like Looking at https://shadow.github.io/docs/guide/semver.html, we do allow ourselves to create new files in the data directory. So we could create a new log file there. But we also promise not to change default values of existing non-experimental options, which I think means we couldn't change the default logging to I suppose we could just do a major version bump, though it feels a little silly for just this. |
|
Oh, actually the default level is already So I think without breaking semver we could start logging (e.g. at debug) to a file in the data directory, and add new options for disabling it, changing that log level, etc. Likewise we could keep logging at info to stdout by default. info is a bit more verbose than I'd generally want to |
On the other (other other other) hand, while this doesn't technically break semver, it could result in surprising changes in storage usage for large simulations |
|
Right. Here is the current behavior that we would need to maintain until a major version bump: If we had the "add a log stream" mechanism, in addition to the two I noted above, we could add a third: Stream 3:
And then on the next major version bump we could remove Stream 3 and the Or something. But I'm not sure that anyone is asking for this. So maybe this is just a brainstorm for now and we file the idea away in an issue? Unless there is motiviation to handle it now while we're thinking about it :) |
|
Yeah, I like this idea, but I'd be in favor of getting something like this PR in for now to solve some of our pain in the Tor CI and continuing to think about this multi-logging feature a bit more |
|
Okay there's a lot for me to read here, so I will make a proper comment here shortly :) But I just wanted to mention that I don't think we should log at "debug" level anywhere by default. My opinion is that shadow is already too noisy at the default info level, and its not a good experience for new users when shadow dumps a lot of irrelevant info by default. I think "debug" should only be enabled when people are debugging something, for both disk and performance reasons (lots of logging will significantly slow down your simulation). |
Thanks, that looks good to me!
Thinking about this some more, I think the mid-line interleaving is more complicated than just application buffering. The terminal just gets a stream of bytes for each of stdout/stderr, and shadow could write to both of those before the terminal wakes up to read them. In that case it wouldn't know whether to display stdout first or stderr. And it would have to chunk the reads from stdout/stderr, and possibly interleave the reads at some point so that stderr doesn't get "left behind" (I can't think of the right word here). But I'm okay with ignoring this. Hopefully it will be rare in practice, and if it ever causes noticeable problems then we can revisit.
I think #3426 was meant to be mostly that we have a "X managed processes in unexpected final state" at the end of the simulation, but we don't actually say which processes those were. If the simulation fails, those processes are probably the most important information for the user. And since we've already got some tracking set up, since we know the number X that failed, it would be good to show more details here. For example if it showed you that curl exited with status "3", then you wouldn't even need to open the log. You'd know exactly which processes' stdout to look at, or maybe you'd go right to the man page and see that it was a "URL malformed" error. So I don't think it would hurt to put them in the summary, even if they're written earlier to stderr.
I would be good with this, but don't have strong feelings either way.
I agree it feels silly, but also it's just a number that's encoding some semantics. Maybe if we quickly get to version >10, it won't feel as bad to make these changes :)
I'd be a fan of a major version bump for that kind of change. The "MAJOR version increases (ex: 1.2.2 to 2.0.0) are intended for incompatible changes" is ambiguous, so I think we should go with whatever version bump best reflects the changes that we make. And since this would have a high probability of breaking someone's workflow, my vote would be for breaking change.
I'd vote for no intermediate stage and just jump directly to major version bump and implement the final version that we want.
I don't have strong motivation to work on it, but the idea sounds good to me if anyone else thinks it will be useful. |
stevenengler
left a comment
There was a problem hiding this comment.
I'm good with these changes, especially since there's a real world use case for them (the tor CI). And the tor CI probably wants to see other error messages anyways and not just the "process failed" ones, so this PR helps with that too.
I don't think this PR completely fixes #3426 in the way that I had pictured it, so I think I'd like to keep it open. Mainly for the case where stdout and stderr are both printed to the terminal. And so we don't have to say "see the log for details", and we can just show the relevant details right there regardless of how shadow's output is being redirected (or not redirected).
I think the multiple output streams idea isn't impacted by this PR, and we don't need to block this PR while we decide on that.
I think that's not (typically) true, and in the common merged case it's a single stream. I suppose it could depend on the shell, though. e.g. on my machine, using the script: With no redirection they're the exact same device and inode, which I think means the shell is reading back from a single pipe. vs other cases: |
Trying to detect whether stdout and stderr are terminals are tricky. In particular, in CI environments where job output is being logged but is also displayed in some web-pseudo-terminal it's difficult to say what the right behavior is. It's probably better to have consistent behavior here that doesn't quietly change under you based on what type of device we think stdout and stderr are. Effectively reverts 223aa4a. Progress on shadow#3426.
I think conventionally stdout should be more of a standardized/machine-parseable format (as our log records are), and stderr is intended for human consumption, e.g. on a terminal. Reformatting the log lines this way when mirroring them to stderr makes them a little friendlier, and makes it a little easier to understand the duplication in cases where stdout and stderr are merged. With an error injected into our sleep test, and stdout redirected to a file: ``` $ (cd build/src/test/sleep/ && rm -rf sleep-shadow.data && /home/jnewsome/projects/shadow/build/src/main/shadow --data-directory=sleep-shadow.data --log-level=trace --use-cpu-pinning false --strace-logging-mode standard --parallelism 1 /home/jnewsome/projects/shadow/src/test/sleep/sleep.yaml > shadow.log) ** Starting Shadow 3.2.0 Error: process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 Error: Failed to run the simulation Error: Error: Caused by: Error: 1 managed processes in unexpected final state ** Shadow did not complete successfully: Failed to run the simulation ** 1 managed processes in unexpected final state ** See the log for details ``` And in merged output: ``` $ (cd build/src/test/sleep/ && rm -rf sleep-shadow.data && /home/jnewsome/projects/shadow/build/src/main/shadow --data-directory=sleep-shadow.data --log-level=trace --use-cpu-pinning false --strace-logging-mode standard --parallelism 1 /home/jnewsome/projects/shadow/src/test/sleep/sleep.yaml) ... 00:00:00.016603 [507267:shadow-worker] 00:00:01.000000000 [DEBUG] [testnode:11.0.0.1] [process.rs:1486] [shadow_rs::host::process] process 'testnode.test_sleep.1000' has completed or is otherwise no longer running Error: process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 00:00:00.016660 [507267:shadow-worker] 00:00:01.000000000 [ERROR] [testnode:11.0.0.1] [process.rs:1571] [shadow_rs::host::process] process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 00:00:00.016710 [507267:shadow-worker] 00:00:01.000000000 [TRACE] [testnode:11.0.0.1] [process.rs:750] [shadow_rs::host::process] Not notifying parent of exit: no signal specified ... ``` The duplication in the merged case is still a little weird, but at least identifiable.
This seems a little more accurate since we're no longer writing the full log line to stderr.
Since we're using a buffered writer to write multiple lines to stdout, previously it may have been possible to be in the state of having flushed only part of a line to stdout when writing to stderr, which could result in confusing interleaving when stdout and stderr are the same destination.
b55ee16 to
b2d710c
Compare
Ah good point, I didn't realize they used the same pipe, but makes sense. I also see that with |
|
Tried to capture the multi-logging "feature request" in #3430. I'm +1 that it seems nice but I don't intend to work on it in the near future. |
The changes in shadow/shadow#3428 should add more details to the CI output of Shadow simulations.
Trying to detect whether stdout and stderr are terminals and deduplicating based on that can be tricky, e.g. in CI environments..
Instead:
stderrversion more of a human-consumable message instead of the raw log line, which also makes the duplication easier to understand in the case where stdout and stderr are merged.New output on a process failure in the unmerged case (where stdout is written to a file and stderr is the terminal):
The merged case is a little ugly, but at least it's harder to mistake the duplicated message on stderr for a second distinct log line:
Progress on #3426.