Skip to content

Conversation

@lmb
Copy link
Contributor

@lmb lmb commented Oct 14, 2019

journald assumes that getsockopt(SO_PEERCRED) correctly identifies the
process on the remote end of the socket. However, this is incorrect
according to man 7 socket:

The returned  credentials  are  those that were in effect at the
time of the call to connect(2) or socketpair(2).

This becomes a problem when a new process inherits the stdout stream
from a parent. First, log messages from the child process will
be attributed to the parent. Second, the struct ucred used by journald
becomes invalid as soon as the parent exits. Further sendmsg calls then
fail with ENOENT. Logs for the child process then vanish from the journal.

Fix this by using recvmsg on the stdout stream, and refreshing the cached
struct ucred if SCM_CREDENTIALS indicate a new process.

Fixes #13708

@yuwata yuwata added the journal label Oct 14, 2019
@lmb
Copy link
Contributor Author

lmb commented Oct 22, 2019

@yuwata is there something I can do to help with reviewing?

@poettering
Copy link
Member

hmm, does this actually work? have you verified this? I am a bit surprised that this is supposed to work, I wasn't aware the SO_PASSCRED had any effect on SOCK_STREAM sockets...

@lmb
Copy link
Contributor Author

lmb commented Oct 23, 2019 via email

@lmb
Copy link
Contributor Author

lmb commented Oct 24, 2019

I realised that I should also force flush the existing buffer if the PID changes, so I've amended the commit.

I'd like to add some sort of integration test if you can point me at some documentation / examples how to do that. It should be straight forward to use systemd-cat and a small helper program that logs and calls execve to test this.

@lmb
Copy link
Contributor Author

lmb commented Oct 25, 2019

I've added an integration test which checks that this works.

@lmb
Copy link
Contributor Author

lmb commented Oct 30, 2019

Ping @poettering and @yuwata. Not sure how to make the builders happy, it seems like they timed out? Otherwise this "works", but I'd appreciate your feedback.

@yuwata
Copy link
Member

yuwata commented Oct 30, 2019

FAILED TESTS:  TEST-04-JOURNAL
autopkgtest [17:21:13]: test upstream: -----------------------]
autopkgtest [17:21:13]: test upstream:  - - - - - - - - - - results - - - - - - - - - -
upstream             FAIL non-zero exit status 1

The failure seems to be related to this PR.

@keszybz keszybz added this to the v244 milestone Oct 30, 2019
@lmb
Copy link
Contributor Author

lmb commented Oct 30, 2019

@yuwata I don't think so. I just re-ran the test on my machine, and it passed:

$ sudo make -C test/TEST-04-JOURNAL KERNEL_APPEND="cryptopts=" clean setup run
...
[  OK  ] Reached target Power-Off.
Container nspawn-root has been shut down.
-rw-r-----+ 1 root systemd-journal 8388608 Oct 30 16:35 /var/tmp/systemd-test.8af7t2/journal/54c00a0fab604a5a98bdd29c2fd1b90d/system.journal
-rw-r-----+ 1 root systemd-journal 5607424 Oct 30 16:35 /var/tmp/systemd-test.8af7t2/journal/b22d29db53b34d78bfc1315707de1714/system.journal
TEST RUN: Journal-related tests [OK]
make: Leaving directory '/home/lorenz/dev/systemd/test/TEST-04-JOURNAL'

Note also that the autopkgtest for i386 and s390x pass.

@evverx
Copy link
Contributor

evverx commented Oct 30, 2019

TEST-04 has never been flaky as far as I can remember so it looks like with this patch applied the test is less stable than usual, which most likely means that either something doesn't work as expected or the test itself is flaky. Could you force-push the code to trigger the CI once again? I think if Ubuntu CI passes, say, 5 times in a row, it will be safe to say that it's no longer flaky. I'll go ahead and add the "ci-fails" label. Could you also try running journalctl --sync before journalctl -b ... to make sure all the logs are written?

@evverx evverx added the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Oct 30, 2019
@lmb
Copy link
Contributor Author

lmb commented Oct 31, 2019

Thanks, I've added the journalctl --sync invocation. Now only 2 instead of 3 CI runs fail! I've added a commit on top which makes the test print itself to the system console. This is a hack, but I couldn't find a way to access this info via the CI UI.

@evverx
Copy link
Contributor

evverx commented Oct 31, 2019

FWIW I managed to reproduce the failure with TEST_NO_KVM (which slows down VMs to somewhat emulate Ubuntu CI). The test failed because grep couldn't find logs starting with PID. journalctl --sync fixes that but on my machine it seems to be failing to connect to journald via varlink (somewhat unpredictably). I'm not sure but maybe it has something to do with SIGKILL we send in the previous substest :-)

This is a hack, but I couldn't find a way to access this info via the CI UI.

There's no way to look at what's going on there as far as I know (apart from downloading the artifacts from Ubuntu CI and searching for the journal files from VMs QEMU runs). In fact, at some point I think it would be great to start using `journal+console': #13719 (comment) :-)

@evverx
Copy link
Contributor

evverx commented Oct 31, 2019

On my machine the test keeps failing intermittently with

[  135.094035] test-journal.sh[417]: + sleep 3
[  135.103719] test-journal.sh[417]: + [[ ! -f /i-lose-my-logs ]]
[  135.130417] test-journal.sh[552]: ++ sed -n 2p
[  135.156334] test-journal.sh[551]: ++ journalctl --new-id128
[  135.173876] test-journal.sh[417]: + ID=cf4f06648442480fab65d63dfc7ce62b
[  135.182686] test-journal.sh[417]: + PID=554
[  135.186328] test-journal.sh[417]: + wait %%
[  135.206088] test-journal.sh[554]: + systemd-cat -t cf4f06648442480fab65d63dfc7ce62b bash -c 'echo parent; (echo child) & wait'
[  135.224238] test-journal.sh[417]: + journalctl --sync
[  135.241017] test-journal.sh[556]: Failed to connect to /run/systemd/journal/io.systemd.journal: Connection refused

Though I don't think it has anything to do with this PR. Maybe for the time being it would make sense to move the new subtest before the part where we make sure that journald doesn't lose data. Another option would be to keep running journalctl --sync until it works (on average it seems to take about a minute for it to succeed there).

@lmb
Copy link
Contributor Author

lmb commented Nov 1, 2019

4/5 autopkgtest passed, which I'm taking as an indication that your fix works @evverx. Thanks for helping me out. I've removed the WIP commit and pushed again, hopefully this means I'll get a clean bill of health over the weekend. CI taking more than 7 hours to run is kind of insane.

@yuwata yuwata removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 3, 2019
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. I'm a bit worried if the additional call to stdout_stream_scan doesn't cause confusion, but I don't think so.

cmp /expected /output

# https://github.com/systemd/systemd/issues/13708
ID=$(journalctl --new-id128 | sed -n 2p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just systemd-id128 new.

PID=$!
wait %%
journalctl --sync
journalctl -b -o export -t "$ID" --output-fields=_PID | grep '^_PID=' >/output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this should not be handled here, but isn't this a bug in journalctl? It outputs fields like __CURSOR and __REALTIME which it shouldn't. @poettering opinions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean when you say "this"? FWIW __CURSOR etc. have been output by the export format for a long time according to git blame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW __CURSOR etc. have been output by the export format for a long time according to git blame.

Yeah, and I think this is a bug. They are written in a separate part of the code than the other fields (because they are based on the entry header, not data), and I think this was overlooked when --output-fields= was being implemented. But I don't think we can just change this at this point, because of backwards compatibility concerns.

Maybe we should add --only-fields= that would only write the specified fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to create an issue I can add a comment which references it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we already have an issue: #5697.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, #5697 is about a different issue.
I opened #13937 now.

.msg_control = buf,
.msg_controllen = sizeof(buf),
.msg_name = NULL,
.msg_namelen = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to iniatialize .msg_name and .msg_namelen here.

cmsg->cmsg_type == SCM_CREDENTIALS &&
cmsg->cmsg_len == CMSG_LEN(sizeof(struct ucred)))
ucred = (struct ucred *)CMSG_DATA(cmsg);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} not needed.

Please add break after ucred = is assigned. There should be only one, and we can short-circuit the loop then.

if (r < 0)
goto terminate;

memcpy(&s->ucred, ucred, sizeof(s->ucred));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just s->ucred = *ucred?

* in the meantime.
*/
if (ucred && ucred->pid != s->ucred.pid) {
r = stdout_stream_scan(s, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here? i.e.

/* force out any previously half-written lines from a different process, before we switch to the new ucred structure for everything we just added */

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed good-to-merge/with-minor-suggestions labels Nov 4, 2019
@poettering
Copy link
Member

two minor issues. look good to merge otherwise

@poettering poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Nov 4, 2019
journald assumes that getsockopt(SO_PEERCRED) correctly identifies the
process on the remote end of the socket. However, this is incorrect
according to man 7 socket:

    The returned  credentials  are  those that were in effect at the
    time of the call to connect(2) or socketpair(2).

This becomes a problem when a new process inherits the stdout stream
from a parent. First, log messages from the child process will
be attributed to the parent. Second, the struct ucred used by journald
becomes invalid as soon as the parent exits. Further sendmsg calls then
fail with ENOENT. Logs for the child process then vanish from the journal.

Fix this by using recvmsg on the stdout stream, and refreshing the cached
struct ucred if SCM_CREDENTIALS indicate a new process.

Fixes #13708
@keszybz
Copy link
Member

keszybz commented Nov 4, 2019

[coredumpctl_collect] Gathering coredumps for '/build/build/test-journal-flush'
           PID: 2419 (test-journal-fl)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Mon 2019-11-04 18:40:55 CET (1h 2min ago)
  Command Line: /build/build/test-journal-flush
    Executable: /build/build/test-journal-flush
 Control Group: /user.slice/user-1000.slice/session-3.scope
          Unit: session-3.scope
         Slice: user-1000.slice
       Session: 3
     Owner UID: 1000 (vagrant)
       Boot ID: 55e7bd01087c4bb09dc6d35d29c71447
    Machine ID: 125938f1dabc4d79bd2b0d8e77d2fbc1
      Hostname: arch.localdomain
       Storage: /var/lib/systemd/coredump/core.test-journal-fl.0.55e7bd01087c4bb09dc6d35d29c71447.2419.1572889255000000000000.lz4
       Message: Process 2419 (test-journal-fl) of user 0 dumped core.
                
                Stack trace of thread 2419:
                #0  0x00007f52b2ddbf25 raise (libc.so.6 + 0x3bf25)
                #1  0x00007f52b2dc5897 abort (libc.so.6 + 0x25897)
                #2  0x00007f52b305cf8e log_assert_failed_realm (libsystemd-shared-243.so + 0xf4f8e)
                #3  0x00005598dc26d5d2 main (test-journal-flush + 0x15d2)
                #4  0x00007f52b2dc7153 __libc_start_main (libc.so.6 + 0x27153)
                #5  0x00005598dc26d18e _start (test-journal-flush + 0x118e)

@keszybz keszybz added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Nov 4, 2019
@evverx
Copy link
Contributor

evverx commented Nov 4, 2019

FWIW test-journal-flush is flaky and has been failing sporadically since it was introduced as far as I know so I don't think it has anything to do with this PR. I would also add that it's not exactly a unit test and should probably be moved elsewhere: #13246 (comment).

@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR labels Nov 4, 2019
@keszybz
Copy link
Member

keszybz commented Nov 4, 2019

OK. Let's wait for the other checks to run though.

@keszybz
Copy link
Member

keszybz commented Nov 5, 2019

CentOS CI (Arch in KVM) failed in test-journal-flush again.

@keszybz keszybz added ci-failure-appears-unrelated and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Nov 5, 2019
@keszybz keszybz merged commit 09d0b46 into systemd:master Nov 5, 2019
@lmb lmb deleted the recvmsg branch November 5, 2019 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Changing process PID causing journal logs to show old PID then logs are missing _SYSTEMD_ tags after a restart of systemd-journald

5 participants