daemon/logger/journald: quit waiting when the logger closes by corhere · Pull Request #47256 · moby/moby

corhere · 2024-01-30T23:58:26Z

Follow-up to logger/journald: fix tailing logs with systemd 255 #47019

- What I did
Fixed a regression in the journald log reader which manifested as rare test failures. The root cause was that it waited for nothing after the logger is closed. In the process I found and fixed some issues with the tests that were getting in the way of debugging. One of the fixes made the Follow tests consistently catch the bug without needing to write a new regression test.

- How I did it
I ran the journald tests repeatedly (e.g. using -count=100) in an attempt to replicate the TestLogRead/Follow/Tail=0 failure seen once in CI. While that test did not fail locally, TestLogRead/Follow/Concurrent would fail ~1% of the time with test output that provided almost no useful information. I ran the tests again with the race detector enabled and discovered that there was a data race in the journald reader harness. I fixed the data race to stop it from obscuring any data races in the journald code (there were no other data races detected) and to make sure that the race was not causing the tests to flake out. Fixing it had no effect on the TestConcurrent failure rate. So I made some improvements to TestConcurrent to make it easier to debug and to address deficiencies I had suspected of being the source of flakiness. One of the interim improvements was to add an explicit flush before closing the logger. It did fix the flakiness, but not in the way I had expected: the test was now consistently failing! That allowed me to run the tests under a debugger and identify the reader bug.

- How to verify it
go test -race ./daemon/logger/journald passes

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

The Go race detector was detecting a data race when running the TestLogRead/Follow/Concurrent test against the journald logging driver. The race was in the test harness, specifically syncLogger. The waitOn field would be reassigned each time a log entry is sent to the journal, which is not concurrency-safe. Make it concurrency-safe using the same patterns that are used in the log follower implementation to synchronize with the logger. Signed-off-by: Cory Snider <csnider@mirantis.com>

Writing the systemd-journal-remote command output directly to os.Stdout and os.Stderr makes it nearly impossible to tell which test case the output is related to when the tests are not run in verbose mode. Extend the journald sender fake to redirect output to the test log so they interleave with the rest of the test output. Signed-off-by: Cory Snider <csnider@mirantis.com>

- Check the return value when logging messages - Log the stream (stdout/stderr) and list of messages that were not read - Wait until the logger is closed before returning early (panic/fatal) Signed-off-by: Cory Snider <csnider@mirantis.com>

The journald reader test harness injects an artificial asynchronous delay into the logging pipeline: a logged message won't be written to the journal until at least 150ms after the Log() call returns. If a test returns while log messages are still in flight to be written, the logs may attempt to be written after the TempDir has been cleaned up, leading to spurious errors. The logger read tests which interleave writing and reading have to include explicit synchronization points to work reliably with this delay in place. On the other hand, tests should not be required to sync the logger explicitly before returning. Override the Close() method in the test harness wrapper to wait for in-flight logs to be flushed to disk. Signed-off-by: Cory Snider <csnider@mirantis.com>

If a reader has caught up to the logger and is waiting for the next message, it should stop waiting when the logger is closed. Otherwise the reader will unnecessarily wait the full closedDrainTimeout for no log messages to arrive. This case was overlooked when the journald reader was recently overhauled to be compatible with systemd 255, and the reader tests only failed when a logical race happened to settle in such a way to exercise the bugged code path. It was only after implicit flushing on close was added to the journald test harness that the Follow tests would repeatably fail due to this bug. (No new regression tests are needed.) Signed-off-by: Cory Snider <csnider@mirantis.com>

corhere added 5 commits January 30, 2024 14:34

d/logger/loggertest: improve TestConcurrent

39c5c16

- Check the return value when logging messages - Log the stream (stdout/stderr) and list of messages that were not read - Wait until the logger is closed before returning early (panic/fatal) Signed-off-by: Cory Snider <csnider@mirantis.com>

corhere added area/logging area/systemd kind/bugfix PR's that fix bugs labels Jan 30, 2024

corhere added this to the 26.0.0 milestone Jan 30, 2024

corhere requested review from cpuguy83, thaJeztah and vvoland January 30, 2024 23:58

thaJeztah mentioned this pull request Jan 31, 2024

[25.0 backport] logger/journald: fix tailing logs with systemd 255 #47243

Merged

vvoland approved these changes Jan 31, 2024

View reviewed changes

corhere added the process/cherry-pick/25.0 label Feb 1, 2024

corhere mentioned this pull request Feb 1, 2024

api: remove deprecated API versions (API < v1.24) #47155

Merged

3 tasks

cpuguy83 approved these changes Feb 2, 2024

View reviewed changes

cpuguy83 merged commit 7c4828f into moby:master Feb 2, 2024

corhere deleted the journald/quit-when-youre-ahead branch February 3, 2024 00:10

thaJeztah added process/cherry-picked and removed process/cherry-pick/25.0 labels Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon/logger/journald: quit waiting when the logger closes#47256

daemon/logger/journald: quit waiting when the logger closes#47256
cpuguy83 merged 5 commits intomoby:masterfrom
corhere:journald/quit-when-youre-ahead

corhere commented Jan 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

corhere commented Jan 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants