Skip to content

Change os.Stderr reassign for Windows service#7023

Merged
fuweid merged 1 commit intocontainerd:mainfrom
dcantah:fix-panicfile-shimlogging
Aug 2, 2022
Merged

Change os.Stderr reassign for Windows service#7023
fuweid merged 1 commit intocontainerd:mainfrom
dcantah:fix-panicfile-shimlogging

Conversation

@dcantah
Copy link
Member

@dcantah dcantah commented Jun 6, 2022

Previously we were reassigning os.Stderr to the panic.log file we create
when getting asked to run Containerd as a Windows service. The panic.log
file was used as a means to easily collect panic stacks as Windows
services don't have regular standard IO, and the usual recommendation
is to either write to the event log or just to a file in the case of
running as a service.

One place where this panic.log flow was biting us was with shim logging,
which is forwarded from the shim and copied to os.Stderr directly which was
causing shim logs to get forwarded to this panic.log file instead of just
panics. We expose an additional --log-file flag if you ask to run a
windows service which is the main way you'd get Containerd logs, and with
this change all of the shim logging which would today end up in panic.log
will now also go to this log file.

@dcantah
Copy link
Member Author

dcantah commented Jun 6, 2022

This is intended to resolve #6990

Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously not part of this CR, but should the file be handled by some sort of log rotate utility like we do for CRI instead of a flat file handle? Then even if it grows over time the caller has the ability to remove the older ones. Even in this case they would hit the same issue just not on the panic.log file now it would be on the actual log file right?

@dcantah
Copy link
Member Author

dcantah commented Jun 7, 2022

@jterry75 Right, I think that's the ideal but like you kind of alluded to I just wanted to get shim logs out of panic.log for now with this change

@jterry75
Copy link
Contributor

jterry75 commented Jun 7, 2022

Np sounds good

@dcantah
Copy link
Member Author

dcantah commented Jun 15, 2022

@kevpar ping

@dcantah
Copy link
Member Author

dcantah commented Jul 29, 2022

ping @kevpar

@kevpar
Copy link
Member

kevpar commented Jul 30, 2022

Sorry, been meaning to get to this one...

I created a table to summarize the behavior here, both before and after the change:
image

I think we could do with a comment in service_windows.go explaining overall what happens with output when run as a service (e.g. what's in this table), since it seems kind of unintuitive.

One thing I'm wondering: When run as a service I don't think stderr goes anywhere useful by default. Would it make this code easier to read to explicitly set os.Stderr = io.Discard in the "no log file" case, so then it would match how we set up log and logrus output?

Otherwise I think this seems reasonable. This will keep panic.log isolated[1] to just actual panics. If users want logging they can set a log file to be used (or collect via ETW, though that won't work for shim logs).

[1] Of course, if any other code in containerd specifically queries GetStdHandle(STD_ERROR_HANDLE) and writes to it, that will still end up in panic.log. I don't think there's anything we can do about that unless Go chooses to support explicit panic log redirection, though.

@dcantah
Copy link
Member Author

dcantah commented Jul 30, 2022

I think we could do with a comment in service_windows.go explaining overall what happens with output when run as a service (e.g. what's in this table), since it seems kind of unintuitive.

Will add

One thing I'm wondering: When run as a service I don't think stderr goes anywhere useful by default. Would it make this code easier to read to explicitly set os.Stderr = io.Discard in the "no log file" case, so then it would match how we set up log and logrus output?

Sure that's a good idea, better to be explicit. Edit: Well to be clear, we can't assign os.Stderr = io.Discard directly as os.Stderr is an os.File and io.Discard is just some Writer. We could open os.DevNull which just points to NUL on windows and assign it to os.Stderr I guess

@dcantah dcantah force-pushed the fix-panicfile-shimlogging branch from c952620 to af22760 Compare August 1, 2022 23:33
@dcantah
Copy link
Member Author

dcantah commented Aug 1, 2022

@kevpar Updated! PTAL

Previously we were reassigning os.Stderr to the panic.log file we create
when getting asked to run Containerd as a Windows service. The panic.log
file was used as a means to easily collect panic stacks as Windows
services don't have regular standard IO, and the usual recommendation
is to either write to the event log or just to a file in the case of
running as a service.

One place where this panic.log flow was biting us was with shim logging,
which is forwarded from the shim and copied to os.Stderr directly which was
causing shim logs to get forwarded to this panic.log file instead of just
panics. We expose an additional `--log-file` flag if you ask to run a
windows service which is the main way you'd get Containerd logs, and with
this change all of the shim logging which would today end up in panic.log
will now also go to this log file.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
@dcantah dcantah force-pushed the fix-panicfile-shimlogging branch from af22760 to ee0f2e9 Compare August 1, 2022 23:36
Copy link
Member

@kevpar kevpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending green CI)

@dcantah
Copy link
Member Author

dcantah commented Aug 2, 2022

Linux CI failures are definitely unrelated and the node-e2e failed on some preamble for fetching package lists

@dcantah
Copy link
Member Author

dcantah commented Aug 2, 2022

/retest

Edit: Nice they passed

@fuweid fuweid merged commit e2069e9 into containerd:main Aug 2, 2022
Copy link
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants