[shepherd] Test failures on GNU/Hurd

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • yelninei
Owner
unassigned
Submitted by
yelninei
Severity
normal

Debbugs page

yelninei wrote 11 months ago
(address . bug-guix@gnu.org)
ONJP9RV--F-9@tutamail.com
guix --version
guix (GNU Guix) 812f972f046e521eabc3ddd76e790d7a69d426b5

fibers test suite is hanging so I am skipping it for now.

Also I had to cheat a bit to get past automakes "t/output-order.sh" test where the guile build driver is leaking "GC Warning: Repeated allocation of very large block" onto stderr confusing the test (similar problem to https://issues.guix.gnu.org/73181).I added env["GC_LARGE_ALLOC_WARN_INTERVAL"] = "100" to DerivationGoal::startBuilder() and used the slightly patched daemon to build automake without changing the derivation.

guix build shepherd --without-tests=guile-fibers -s i586-gnu

These 3 tests are failing:
FAIL: tests/logging-failure.sh
FAIL: tests/forking-service.sh
FAIL: tests/services/system-log.sh

Some others are being skipped, looking at the shepherd code this seems to be expected.
SKIP: tests/logging-ENOSPC.sh
SKIP: tests/close-on-exec.sh
SKIP: tests/daemonize.sh

Here is the output of the failing ones:

FAIL: tests/logging-failure
===========================

+ shepherd --version
System lacks support for 'signalfd'; using fallback mechanism.
shepherd (GNU Shepherd) 1.0.3
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ herd --version
herd (GNU Shepherd) 1.0.3
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ log_directory=/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
+ socket=t-socket-8783
+ conf=t-conf-8783
+ log=t-log-8783
+ pid=t-pid-8783
+ herd='herd -s t-socket-8783'
+ trap 'cat t-log-8783 || true; rm -f t-log-8783 t-socket-8783 t-conf-8783 t-pid-8783; rm -rf /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783;
      kill $main_pid || true' EXIT
+ cat
+ rm -f t-pid-8783
+ main_pid=8788
+ test -f t-pid-8783
+ shepherd -I -s t-socket-8783 -c t-conf-8783 -l t-log-8783 --pid=t-pid-8783
+ sleep 0.3
System lacks support for 'signalfd'; using fallback mechanism.
GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 8788 command: #f>.
Service root has been started.
Configuration successfully loaded from 't-conf-8783'.
+ test -f t-pid-8783
+ herd -s t-socket-8783 status
Started:
+ root
Stopped:
- log-directory-does-not-exist
- log-directory-not-writable
+ test -f /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
+ herd -s t-socket-8783 start log-directory-does-not-exist
Service log-directory-does-not-exist has been started.
+ herd -s t-socket-8783 status log-directory-does-not-exist
+ grep running
  It is running since 07:53:13 (0 seconds ago).
+ herd -s t-socket-8783 status log-directory-does-not-exist
+ grep /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783/service.log
  Log file: /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783/service.log
+ grep logging /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783/service.log
2025-04-08 07:53:13 logging
+ herd -s t-socket-8783 start log-directory-not-writable
++ cat t-log-8783
2025-04-08 07:53:13 GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
2025-04-08 07:53:13 Starting service root...
2025-04-08 07:53:13 Service root started.
2025-04-08 07:53:13 Service root running with value #<<process> id: 8788 command: #f>.
2025-04-08 07:53:13 Service root has been started.
2025-04-08 07:53:13 Configuration successfully loaded from 't-conf-8783'.
2025-04-08 07:53:13 Starting service log-directory-does-not-exist...
2025-04-08 07:53:13 Service log-directory-does-not-exist has been started.
2025-04-08 07:53:13 Service log-directory-does-not-exist started.
2025-04-08 07:53:13 Service log-directory-does-not-exist running with value #<<process> id: 8792 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "while true; do echo logging; sleep 0.2; done")>.
2025-04-08 07:53:13 Starting service log-directory-not-writable...
Terminated
++ rm -f t-log-8783 t-socket-8783 t-conf-8783 t-pid-8783
++ rm -rf /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
++ kill 8788
FAIL tests/logging-failure.sh (exit status: 124)

FAIL: tests/forking-service
===========================

+ shepherd --version
System lacks support for 'signalfd'; using fallback mechanism.
shepherd (GNU Shepherd) 1.0.3
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ herd --version
herd (GNU Shepherd) 1.0.3
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ socket=t-socket-10595
+ conf=t-conf-10595
+ log=t-log-10595
+ pid=t-pid-10595
+ service_pid=t-service-pid-10595
+ service_nofiles=t-service-nofiles-10595
+ service2_pid=t-service2-pid-10595
+ service2_started=t-service2-starts-10595
+ herd='herd -s t-socket-10595'
+ trap cleanup EXIT
+ cat
+ cat t-conf-10595
;; Leave a timeout long enough for slow machines.
(default-pid-file-timeout 10)

(define %command
  '("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595"))

(register-services
(list (service
;; A service that forks into a different process.
'(test)
#:start (make-forkexec-constructor
                   %command
                   #:pid-file "/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595"
                   #:resource-limits '((nofile 1567 1567)))
#:stop  (make-kill-destructor)
#:respawn? #f)))

(define %command2
  '("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "echo started >> /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-starts-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-pid-10595"))

(register-services
(list (service
;; A service that forks into a different process.
'(test2)
#:start (make-forkexec-constructor %command2
   #:pid-file "/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-pid-10595")
#:stop  (make-kill-destructor)
         #:respawn-delay 0
#:respawn? #t)))

(define %command3
  '("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "sleep 600"))

(register-services
(list (service
;; A service that forks into a different process.
'(test3)
#:start (make-forkexec-constructor %command3)
#:stop  (make-kill-destructor)
#:respawn? #t)))

(define %command4
  '("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "trap 'echo ignoring SIGTERM' SIGTERM; while true ; do : ; done"))

(register-services
(list (service
;; A service that ignores SIGTERM.
'(test4)
#:requirement '(test3)
#:start (make-forkexec-constructor %command4)
#:stop  (make-kill-destructor SIGTERM #:grace-period 3))))
+ rm -f t-pid-10595
+ test -f t-pid-10595
+ shepherd -I -s t-socket-10595 -c t-conf-10595 -l t-log-10595 --pid=t-pid-10595
+ sleep 0.3
System lacks support for 'signalfd'; using fallback mechanism.
GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 10602 command: #f>.
Service root has been started.
Configuration successfully loaded from 't-conf-10595'.
+ test -f t-pid-10595
++ cat t-pid-10595
+ shepherd_pid=10602
+ herd -s t-socket-10595 start test
Starting service test...
Service test has been started.
+ herd -s t-socket-10595 start test2
Starting service test2...
Service test2 started.
Service test2 running with value #<<process> id: 10623 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "echo started >> /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-starts-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-pid-10595")>.
Service test2 has been started.
+ herd -s t-socket-10595 status test
+ grep running
  It is running since 07:56:16 (1 second ago).
+ test -f t-service-pid-10595
++ cat t-service-pid-10595
+ service_pid_value=10610
+ kill 10610
+ kill -0 10610
./tests/forking-service.sh: line 133: kill: (10610) - No such process
+ within_seconds 10 'herd -s t-socket-10595 status test | grep stopped'
+ seconds=10
+ '[' 10 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
Uncaught exception in essential task process-monitor: (wrong-type-arg "=" "Wrong type argument in position ~A: ~S" (1 #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")>) (#<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")>))
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 10 - 1
+ seconds=9
+ '[' 9 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 9 - 1
+ seconds=8
+ '[' 8 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 8 - 1
+ seconds=7
+ '[' 7 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 7 - 1
+ seconds=6
+ '[' 6 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 6 - 1
+ seconds=5
+ '[' 5 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 5 - 1
+ seconds=4
+ '[' 4 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 4 - 1
+ seconds=3
+ '[' 3 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 3 - 1
+ seconds=2
+ '[' 2 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 2 - 1
+ seconds=1
+ '[' 1 -gt 0 ']'
+ /gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash -c 'herd -s t-socket-10595 status test | grep stopped'
+ sleep 1
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
++ expr 1 - 1
+ seconds=0
+ cleanup
+ cat t-log-10595
2025-04-08 07:56:14 GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
2025-04-08 07:56:14 Starting service root...
2025-04-08 07:56:14 Service root started.
2025-04-08 07:56:14 Service root running with value #<<process> id: 10602 command: #f>.
2025-04-08 07:56:14 Service root has been started.
2025-04-08 07:56:14 Configuration successfully loaded from 't-conf-10595'.
2025-04-08 07:56:15 Starting service test...
2025-04-08 07:56:16 Service test has been started.
2025-04-08 07:56:16 Service test started.
2025-04-08 07:56:16 Service test running with value #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")>.
2025-04-08 07:56:16 Starting service test2...
2025-04-08 07:56:17 Service test2 has been started.
2025-04-08 07:56:17 Service test2 started.
2025-04-08 07:56:17 Service test2 running with value #<<process> id: 10623 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "echo started >> /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-starts-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service2-pid-10595")>.
2025-04-08 07:56:17 PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
2025-04-08 07:56:17 Uncaught exception in essential task process-monitor: (wrong-type-arg "=" "Wrong type argument in position ~A: ~S" (1 #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepher
This message was truncated. Download the full message here.
Ludovic Courtès wrote 11 months ago
(name . yelninei--- via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 77634@debbugs.gnu.org)(address . yelninei@tutamail.com)
8734ebdb0s.fsf@gnu.org
Hello,

Thanks for the bug report!

yelninei--- via Bug reports for GNU Guix <bug-guix@gnu.org> skribis:

Toggle quote (2 lines)
> Also I had to cheat a bit to get past automakes "t/output-order.sh" test where the guile build driver is leaking "GC Warning: Repeated allocation of very large block" onto stderr confusing the test (similar problem to https://issues.guix.gnu.org/73181).I added env["GC_LARGE_ALLOC_WARN_INTERVAL"] = "100" to DerivationGoal::startBuilder() and used the slightly patched daemon to build automake without changing the derivation.

Uh, sounds terrible.

Toggle quote (19 lines)
> + herd -s t-socket-8783 start log-directory-not-writable
> ++ cat t-log-8783
> 2025-04-08 07:53:13 GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
> 2025-04-08 07:53:13 Starting service root...
> 2025-04-08 07:53:13 Service root started.
> 2025-04-08 07:53:13 Service root running with value #<<process> id: 8788 command: #f>.
> 2025-04-08 07:53:13 Service root has been started.
> 2025-04-08 07:53:13 Configuration successfully loaded from 't-conf-8783'.
> 2025-04-08 07:53:13 Starting service log-directory-does-not-exist...
> 2025-04-08 07:53:13 Service log-directory-does-not-exist has been started.
> 2025-04-08 07:53:13 Service log-directory-does-not-exist started.
> 2025-04-08 07:53:13 Service log-directory-does-not-exist running with value #<<process> id: 8792 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "while true; do echo logging; sleep 0.2; done")>.
> 2025-04-08 07:53:13 Starting service log-directory-not-writable...
> Terminated
> ++ rm -f t-log-8783 t-socket-8783 t-conf-8783 t-pid-8783
> ++ rm -rf /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
> ++ kill 8788
> FAIL tests/logging-failure.sh (exit status: 124)

Apparently the ‘shepherd’ process was terminated by the ‘timeout’
command, meaning that the test hanged for more than 3 minutes.

I failed to reproduce it on Linux (without ‘signalfd’ and without
‘PR_SET_CHILD_SUBREAPER’), and I have yet to build dependencies in my
childhurd to try it natively…

Toggle quote (2 lines)
> 2025-04-08 07:56:27 PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!

[...]

Toggle quote (2 lines)
> FAIL tests/forking-service.sh (exit status: 1)

This one is a bug fixed in Shepherd commit
82a333f8e83b4abbd19dbc6c9bab1b7d62cced0c.

Toggle quote (15 lines)
> + herd -s t-socket-12440 start logger
> Starting service system-log...
> Service system-log has been started.
> Service system-log started.
>
> ;;; (message-destination-> ("/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-syslog-12440"))
> Service system-log running with value #<system-log 39f82c0>.
> Starting service logger...
> Service logger has been started.
> + herd -s t-socket-12440 status logger
> + grep stopped
> ++ zcat 't-log-12440.*'
> gzip: t-log-12440.*.gz: No such file or directory
> Terminated

[...]

Toggle quote (2 lines)
> FAIL tests/services/system-log.sh (exit status: 124)

This one also hanged more than 3m it seems.

To be continued…

Ludo’.
yelninei wrote 11 months ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 77634)(address . 77634@debbugs.gnu.org)
ONnpPHg--F-9@tutamail.com
Hello Ludo,


Apr 13, 2025, 21:09 by ludo@gnu.org:


Toggle quote (27 lines)
>> + herd -s t-socket-8783 start log-directory-not-writable
>> ++ cat t-log-8783
>> 2025-04-08 07:53:13 GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
>> 2025-04-08 07:53:13 Starting service root...
>> 2025-04-08 07:53:13 Service root started.
>> 2025-04-08 07:53:13 Service root running with value #<<process> id: 8788 command: #f>.
>> 2025-04-08 07:53:13 Service root has been started.
>> 2025-04-08 07:53:13 Configuration successfully loaded from 't-conf-8783'.
>> 2025-04-08 07:53:13 Starting service log-directory-does-not-exist...
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist has been started.
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist started.
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist running with value #<<process> id: 8792 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "while true; do echo logging; sleep 0.2; done")>.
>> 2025-04-08 07:53:13 Starting service log-directory-not-writable...
>> Terminated
>> ++ rm -f t-log-8783 t-socket-8783 t-conf-8783 t-pid-8783
>> ++ rm -rf /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
>> ++ kill 8788
>> FAIL tests/logging-failure.sh (exit status: 124)
>>
>
> Apparently the ‘shepherd’ process was terminated by the ‘timeout’
> command, meaning that the test hanged for more than 3 minutes.
>
> I failed to reproduce it on Linux (without ‘signalfd’ and without
> ‘PR_SET_CHILD_SUBREAPER’), and I have yet to build dependencies in my
> childhurd to try it natively…
>
If there is something you want me to try let me know.

I was able to reproduce this by extracting the shepherd conf from the test and and attempting to run the test manually on the childhurd with the cross compiled shepherd (so you should not need to have a native toolchain available)


Might be the attempt to write to /proc.

Starting the log-directory-not-writable service causes the entire shepherd to become completely unresponsive causing the timeout failure.

A similar behavior occurs with

(call-with-output-file "/proc/1/something"
  (lambda (port)
    (display "Hello" port)))

which hangs my guile completely.

When I change the log file to a more normal unwritable path the test passes.


Toggle quote (11 lines)
>> 2025-04-08 07:56:27 PID #<<process> id: 10610 command: ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" "-c" "ulimit -n >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595; sleep 600 & echo $! > /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> (test) is dead!
>>
>
> [...]
>
>> FAIL tests/forking-service.sh (exit status: 1)
>>
>
> This one is a bug fixed in Shepherd commit
> 82a333f8e83b4abbd19dbc6c9bab1b7d62cced0c.
>
Can confirm that this is fixed in 1.0.4rc1.


Toggle quote (23 lines)
>> + herd -s t-socket-12440 start logger
>> Starting service system-log...
>> Service system-log has been started.
>> Service system-log started.
>>
>> ;;; (message-destination-> ("/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-syslog-12440"))
>> Service system-log running with value #<system-log 39f82c0>.
>> Starting service logger...
>> Service logger has been started.
>> + herd -s t-socket-12440 status logger
>> + grep stopped
>> ++ zcat 't-log-12440.*'
>> gzip: t-log-12440.*.gz: No such file or directory
>> Terminated
>>
>
> [...]
>
>> FAIL tests/services/system-log.sh (exit status: 124)
>>
>
> This one also hanged more than 3m it seems.
>
The shepherd syslog seems to be extremely slow:

I extracted the logger.scm and conf.scm from the test, removed the $ from the shell variables variables, started shepherd, echoed the test msg into  the kmsg file and then tried to start the shepherd syslog.

It took multiple minutes to see the "Service system-log running with value" in the log and then another couple of minutes for "herd start syslogd" to return. Afterwards trying to query the syslog status (or trying to interact with shepherd in any other way) takes forever to complete while syslogd is running.

I did this with the 1.0.4rc1 shepherd to work around the blocking socket problem with guix-daemon (#77610, it still fails on the first connection now but idk if this is a problem with shepherd or the guix-daemon)

With the 1.0.3 shepherd syslogd seems to be a lot quicker to start initally ( see the log from the first message) but still extremely slow to interact with afterwards.

Even after removing the test timeout the test will fail eventually after ~900 seconds because $syslog_auth_file did not show up (yet?).


Toggle quote (5 lines)
> To be continued…
>
> Ludo’.
>

Thank you for your hard work on shepherd.
Ludovic Courtès wrote 11 months ago
(address . yelninei@tutamail.com)(name . 77634)(address . 77634@debbugs.gnu.org)
87tt6qg9ep.fsf@gnu.org
Hi yelninei,

yelninei@tutamail.com writes:

Toggle quote (2 lines)
>>> FAIL tests/logging-failure.sh (exit status: 124)

[...]

Toggle quote (20 lines)
> I was able to reproduce this by extracting the shepherd conf from the
> test and and attempting to run the test manually on the childhurd with
> the cross compiled shepherd (so you should not need to have a native
> toolchain available)
>
>
> Might be the attempt to write to /proc.
>
> Starting the log-directory-not-writable service causes the entire shepherd to become completely unresponsive causing the timeout failure.
>
> A similar behavior occurs with
>
> (call-with-output-file "/proc/1/something"
>   (lambda (port)
>     (display "Hello" port)))
>
> which hangs my guile completely.
>
> When I change the log file to a more normal unwritable path the test passes.

Ah, interesting. So /something instead of /proc/1/something would work,
right? (I picked /proc/1/something because ‘guix shell -C’ currently
gives a writable root file system… Going to be fixed with

You could also report the procfs issue to bug-hurd I guess. :-)

Toggle quote (26 lines)
>>> FAIL tests/services/system-log.sh (exit status: 124)
>>>
>>
>> This one also hanged more than 3m it seems.
>>
> The shepherd syslog seems to be extremely slow:
>
> I extracted the logger.scm and conf.scm from the test, removed the $
> from the shell variables variables, started shepherd, echoed the test
> msg into  the kmsg file and then tried to start the shepherd syslog.
>
> It took multiple minutes to see the "Service system-log running with
> value" in the log and then another couple of minutes for "herd start
> syslogd" to return. Afterwards trying to query the syslog status (or
> trying to interact with shepherd in any other way) takes forever to
> complete while syslogd is running.
>
> I did this with the 1.0.4rc1 shepherd to work around the blocking
> socket problem with guix-daemon (#77610, it still fails on the first
> connection now but idk if this is a problem with shepherd or the
> guix-daemon)
>
> With the 1.0.3 shepherd syslogd seems to be a lot quicker to start
> initally ( see the log from the first message) but still extremely
> slow to interact with afterwards.

Weird. Could you bisect between 1.0.3 and HEAD to try and find the
source of slowness?

I wonder if f1a82845eaf8851af9a811e5a1d185b68b1c363f might explain it,
but that’s pre-1.0.3.

Alternatively, you can try running ‘shepherd’ under ‘rpctrace’ for the
syslog slowness issue, so we get an idea of what it’s doing.

Thanks a lot for helping out!

Ludo’.
yelninei wrote 11 months ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 77634)(address . 77634@debbugs.gnu.org)
ONomdc_--F-9@tutamail.com
Hello Ludo,
Apr 14, 2025, 13:28 by ludo@gnu.org:

Toggle quote (38 lines)
> Hi yelninei,
>
> yelninei@tutamail.com writes:
>
>>>> FAIL tests/logging-failure.sh (exit status: 124)
>>>>
>
> [...]
>
>> I was able to reproduce this by extracting the shepherd conf from the
>> test and and attempting to run the test manually on the childhurd with
>> the cross compiled shepherd (so you should not need to have a native
>> toolchain available)
>>
>>
>> Might be the attempt to write to /proc.
>>
>> Starting the log-directory-not-writable service causes the entire shepherd to become completely unresponsive causing the timeout failure.
>>
>> A similar behavior occurs with
>>
>> (call-with-output-file "/proc/1/something"
>>   (lambda (port)
>>     (display "Hello" port)))
>>
>> which hangs my guile completely.
>>
>> When I change the log file to a more normal unwritable path the test passes.
>>
>
> Ah, interesting. So /something instead of /proc/1/something would work,
> right? (I picked /proc/1/something because ‘guix shell -C’ currently
> gives a writable root file system… Going to be fixed with
> <https://issues.guix.gnu.org/77638>.)
>
> You could also report the procfs issue to bug-hurd I guess. :-)
>

Yes.
I think reproducing this on the latest mach/hurd first would be better but I have been a bit busy trying to understand what went wrong with glibc2.41


Toggle quote (40 lines)
>>>> FAIL tests/services/system-log.sh (exit status: 124)
>>>>
>>>
>>> This one also hanged more than 3m it seems.
>>>
>> The shepherd syslog seems to be extremely slow:
>>
>> I extracted the logger.scm and conf.scm from the test, removed the $
>> from the shell variables variables, started shepherd, echoed the test
>> msg into  the kmsg file and then tried to start the shepherd syslog.
>>
>> It took multiple minutes to see the "Service system-log running with
>> value" in the log and then another couple of minutes for "herd start
>> syslogd" to return. Afterwards trying to query the syslog status (or
>> trying to interact with shepherd in any other way) takes forever to
>> complete while syslogd is running.
>>
>> I did this with the 1.0.4rc1 shepherd to work around the blocking
>> socket problem with guix-daemon (#77610, it still fails on the first
>> connection now but idk if this is a problem with shepherd or the
>> guix-daemon)
>>
>> With the 1.0.3 shepherd syslogd seems to be a lot quicker to start
>> initally ( see the log from the first message) but still extremely
>> slow to interact with afterwards.
>>
>
> Weird. Could you bisect between 1.0.3 and HEAD to try and find the
> source of slowness?
>
> I wonder if f1a82845eaf8851af9a811e5a1d185b68b1c363f might explain it,
> but that’s pre-1.0.3.
>
> Alternatively, you can try running ‘shepherd’ under ‘rpctrace’ for the
> syslog slowness issue, so we get an idea of what it’s doing.
>
> Thanks a lot for helping out!
>
> Ludo’.
>
It is still slow on 1.0.3 to do anything once its started but with 1.0.4rc1 even starting is extremely painful.
I tried to manually add features to the system-log-service and the thing that causes everything to slow down drastically is the addition of the #:kernel-log-file (even if it is empty).

Setting that to #f instead makes the test now fail considerable faster (1.8s rather than 900).

Running the logger manually only the syslog_remote_file seems to be created. This is consistent with the log output from the test.
I also added a pk for the message-sender to investigate.
Difference I can see is that on Linux most of them are #f but here are #(1 \x00") instead
FAIL: tests/services/system-log
===============================

+ shepherd --version
System lacks support for 'signalfd'; using fallback mechanism.
shepherd (GNU Shepherd) 1.0.4rc1
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ herd --version
herd (GNU Shepherd) 1.0.4rc1
Copyright (C) 2025 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ socket=t-socket-25332
+ conf=t-conf-25332
+ log=t-log-25332
+ pid=t-pid-25332
+ logger=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-logger-25332.scm
+ kmsg=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-kmsg-25332
+ syslog_file=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-25332
+ syslog_auth_file=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-auth-25332
+ syslog_debug_file=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-debug-25332
+ syslog_remote_file=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332
+ syslog_socket=/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-socket-25332
+ herd='herd -s t-socket-25332'
+ trap 'zcat t-log-25332.* || true; cat t-log-25332 || true;
      rm -f t-socket-25332 t-conf-25332 t-log-25332* /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-logger-25332.scm /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-kmsg-25332 /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-socket-25332;
      rm -f /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-25332* /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-auth-25332* /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-debug-25332* /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332*;
      test -f t-pid-25332 && kill `cat t-pid-25332` || true; rm -f t-pid-25332' EXIT
+ cat
+ cat
+ chmod +x /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-logger-25332.scm
+ cat
+ rm -f t-pid-25332
+ test -f t-pid-25332
+ shepherd -I -s t-socket-25332 -c t-conf-25332 -l t-log-25332 --pid=t-pid-25332
+ sleep 0.3
System lacks support for 'signalfd'; using fallback mechanism.
GNU Shepherd 1.0.4rc1 (Guile 3.0.9, i586-unknown-gnu)
Starting service root...
Service root started.
Service root running with value #<<process> id: 25340 command: #f>.
Service root has been started.
Configuration successfully loaded from 't-conf-25332'.
+ test -f t-pid-25332
+ herd -s t-socket-25332 eval root '(gc)'
Evaluating user expression (gc).
"#<unspecified>"
++ file_descriptor_count
++ wc -l
+++ cat t-pid-25332
++ ls -l '/proc/25340/fd/[0-9]*'
ls: cannot access '/proc/25340/fd/[0-9]*': No such file or directory
+ initial_fd_count=0
+ herd -s t-socket-25332 start logger
Starting service system-log...
Service system-log has been started.
Service system-log started.
Service system-log running with value #<system-log 39c5290>.
Starting service logger...
Service logger has been started.
+ herd -s t-socket-25332 status logger
+ grep stopped
  It is stopped since 14:39:57 (0 seconds ago).
+ grep 'starting logger' t-log-25332
2025-04-14 14:39:57 t-syslog-logger-25332.scm[25348] starting logger
+ grep 'sudo:.* session opened' /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-auth-25332
grep: /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-auth-25332: No such file or directory
+ zcat 't-log-25332.*'
gzip: t-log-25332.*.gz: No such file or directory
+ true
+ cat t-log-25332
2025-04-14 14:39:57 GNU Shepherd 1.0.4rc1 (Guile 3.0.9, i586-unknown-gnu)
2025-04-14 14:39:57 Starting service root...
2025-04-14 14:39:57 Service root started.
2025-04-14 14:39:57 Service root running with value #<<process> id: 25340 command: #f>.
2025-04-14 14:39:57 Service root has been started.
2025-04-14 14:39:57 Configuration successfully loaded from 't-conf-25332'.
2025-04-14 14:39:57 Evaluating user expression (gc).
2025-04-14 14:39:57 Starting service system-log...
2025-04-14 14:39:57 Service system-log has been started.
2025-04-14 14:39:57 Service system-log started.
2025-04-14 14:39:57 Service system-log running with value #<system-log 39c5290>.
2025-04-14 14:39:57 Starting service logger...
2025-04-14 14:39:57 Service logger has been started.
2025-04-14 14:39:57 Service logger started.
2025-04-14 14:39:57 Service logger running with value #<<process> id: 25348 command: ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-logger-25332.scm")>.
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(2 2130706433 52204))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 t-syslog-logger-25332.scm[25348] starting logger
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (system-log-message-sender-> #(1 "\x00"))
2025-04-14 14:39:57 
2025-04-14 14:39:57 ;;; (message-destination-> ("/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332"))
2025-04-14 14:39:57 Service logger has been disabled.
+ rm -f t-socket-25332 t-conf-25332 t-log-25332 /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-logger-25332.scm /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-kmsg-25332 /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-socket-25332
+ rm -f '/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-25332*' '/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-auth-25332*' '/tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-debug-25332*' /tmp/guix-build-shepherd-libevent-1.0.99-git.drv-0/source/t-syslog-remote-25332
+ test -f t-pid-25332
++ cat t-pid-25332
+ kill 25340
+ rm -f t-pid-25332
FAIL tests/services/system-log.sh (exit status: 2)
Stopping service root...
Exiting shepherd...
Service log-rotation is not running.
Service logger is not running.
Exiting.
Ludovic Courtès wrote 11 months ago
(address . yelninei@tutamail.com)(name . 77634)(address . 77634@debbugs.gnu.org)
87ecxtfpk1.fsf@gnu.org
Hey,

yelninei@tutamail.com writes:

Toggle quote (7 lines)
> It is still slow on 1.0.3 to do anything once its started but with 1.0.4rc1 even starting is extremely painful.
> I tried to manually add features to the system-log-service and the
> thing that causes everything to slow down drastically is the addition
> of the #:kernel-log-file (even if it is empty).
>
> Setting that to #f instead makes the test now fail considerable faster (1.8s rather than 900).

Interesting.

I cross-compiled current ‘main’ to test it in my childhurd, as an
unprivileged user, like so:

echo '(use-modules (shepherd service system-log)) (register-services (list (system-log-service (list (endpoint (make-socket-address AF_UNIX "/home/ludo/syslog.sock") #:style SOCK_DGRAM)) #:kernel-log-file "t.conf")))' > t.conf
./shep/bin/shepherd -c t.conf &

I can reproduce the unresponsiveness as soon as I run ‘herd start
system-log’.

Unfortunately, rpctrace is unhelpful. Even ‘rpctrace -o log
./shep/bin/shepherd --version’ hangs:

Toggle snippet (55 lines)
ludo@childhurd ~$ rpctrace -s 10 -o log ./shep/bin/shepherd --version
Terminated
ludo@childhurd ~$ tail -50 log ; echo END
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--79(pid283)->io_map_request () = 0 80<--77(pid283) (null)
task13(pid283)-> 2089 (48619520 4096 0 0 80<--77(pid283) 4096 1 5 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (48619520 4096) = 0
task13(pid283)-> 2089 (48619520 4096 0 0 80<--77(pid283) 4096 1 5 7 1) = 0 48619520
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--79(pid283)->io_map_request () = 0 80<--81(pid283) (null)
task13(pid283)-> 2089 (48623616 4096 0 0 80<--81(pid283) 8192 1 1 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (48623616 4096) = 0
task13(pid283)-> 2089 (48623616 4096 0 0 80<--81(pid283) 8192 1 1 7 1) = 0 48623616
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--79(pid283)->io_map_request () = 0 80<--77(pid283) (null)
task13(pid283)-> 2089 (48627712 4096 0 0 80<--77(pid283) 8192 1 3 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (48627712 4096) = 0
task13(pid283)-> 2089 (48627712 4096 0 0 80<--77(pid283) 8192 1 3 7 1) = 0 48627712
task13(pid283)-> 3206 (pn{ 35}) = 0
task13(pid283)-> 3206 (pn{ 34}) = 0
17<--33(pid283)->dir_lookup ("gnu/store/" 4194305 0) = 0 1 "" 78<--81(pid283)
78<--81(pid283)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message ID)
78<--81(pid283)->io_read_request (-1 512) = 0 "\x7fELF\x01\x01\x01\0\0\0"
78<--81(pid283)->io_stat_request () = 0 {23 7 0 80111 0 1744699086 0 33133 1 0 0 375340 0 1744691226 270000000 1 0 1744691169 440000000 8192 744 0 0 0 0 0 0 0 0 0 0 0}
78<--81(pid283)->io_map_request () = 0 80<--79(pid283) (null)
task13(pid283)-> 2089 (0 379380 0 1 80<--79(pid283) 0 1 1 7 1) = 0 48631808
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--81(pid283)->io_map_request () = 0 80<--77(pid283) (null)
task13(pid283)-> 2089 (48676864 204800 0 0 80<--77(pid283) 45056 1 5 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (48676864 204800) = 0
task13(pid283)-> 2089 (48676864 204800 0 0 80<--77(pid283) 45056 1 5 7 1) = 0 48676864
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--81(pid283)->io_map_request () = 0 80<--79(pid283) (null)
task13(pid283)-> 2089 (48881664 122880 0 0 80<--79(pid283) 249856 1 1 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (48881664 122880) = 0
task13(pid283)-> 2089 (48881664 122880 0 0 80<--79(pid283) 249856 1 1 7 1) = 0 48881664
task13(pid283)-> 3206 (pn{ 35}) = 0
78<--81(pid283)->io_map_request () = 0 80<--77(pid283) (null)
task13(pid283)-> 2089 (49004544 8192 0 0 80<--77(pid283) 368640 1 3 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2023 (49004544 8192) = 0
task13(pid283)-> 2089 (49004544 8192 0 0 80<--77(pid283) 368640 1 3 7 1) = 0 49004544
task13(pid283)-> 3206 (pn{ 35}) = 0
task13(pid283)-> 3206 (pn{ 34}) = 0
task13(pid283)-> 2089 (48435200 65536 0 0 (null) 0 1 3 7 1) = 0x3 ((os/kern) no space available)
task13(pid283)-> 2089 (48435200 65536 0 1 (null) 0 1 3 7 1) = 0 49012736
task13(pid283)-> 2012 (1 22) = 0 {0 25 183304192 6975488 0 0 0 0 1744693029 80000 0 0 0 0 0 0 0 0 1744693029 0 80000000 0}
task13(pid283)-> 2012 (3 12) = 0 {0 0 0 10000 0 0 0 0 0 0 10000000 0}
task13(pid283)-> 2012 (1 22) = 0 {0 25 183304192 6975488 0 0 0 0 1744693029 80000 0 0 0 0 0 0 0 0 1744693029 0 80000000 0}
task13(pid283)-> 2012 (3 12) = 0 {0 0 0 10000 0 0 0 0 0 0 10000000 0}
65<--64(pid283)->io_write_request ("\0" -1) = 0 1
18... = 0 "\0"
task13(pid283)-> 2089 (49078272 471040 0 0 (null) 0 1 3 7 1) ...18
task13(pid283)-> 4204 (18820768 2 0 0 0)END

Toggle quote (4 lines)
> Running the logger manually only the syslog_remote_file seems to be created. This is consistent with the log output from the test.
> I also added a pk for the message-sender to investigate.
> Difference I can see is that on Linux most of them are #f but here are #(1 \x00") instead

Yeah that’s the data coming from ‘recvfrom!’ where 1 = AF_UNIX and Linux
returns nothing.

Thanks,
Ludo’.
yelninei wrote 11 months ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 77634)(address . 77634@debbugs.gnu.org)
ONtJ5V1--F-9@tutamail.com
Hi Ludo,


Apr 14, 2025, 14:53 by yelninei@tutamail.com:

Toggle quote (45 lines)
>
>
>>> The shepherd syslog seems to be extremely slow:
>>>
>>> I extracted the logger.scm and conf.scm from the test, removed the $
>>> from the shell variables variables, started shepherd, echoed the test
>>> msg into  the kmsg file and then tried to start the shepherd syslog.
>>>
>>> It took multiple minutes to see the "Service system-log running with
>>> value" in the log and then another couple of minutes for "herd start
>>> syslogd" to return. Afterwards trying to query the syslog status (or
>>> trying to interact with shepherd in any other way) takes forever to
>>> complete while syslogd is running.
>>>
>>> I did this with the 1.0.4rc1 shepherd to work around the blocking
>>> socket problem with guix-daemon (#77610, it still fails on the first
>>> connection now but idk if this is a problem with shepherd or the
>>> guix-daemon)
>>>
>>> With the 1.0.3 shepherd syslogd seems to be a lot quicker to start
>>> initally ( see the log from the first message) but still extremely
>>> slow to interact with afterwards.
>>>
>>
>> Weird. Could you bisect between 1.0.3 and HEAD to try and find the
>> source of slowness?
>>
>> I wonder if f1a82845eaf8851af9a811e5a1d185b68b1c363f might explain it,
>> but that’s pre-1.0.3.
>>
>> Alternatively, you can try running ‘shepherd’ under ‘rpctrace’ for the
>> syslog slowness issue, so we get an idea of what it’s doing.
>>
>> Thanks a lot for helping out!
>>
>> Ludo’.
>>
> It is still slow on 1.0.3 to do anything once its started but with 1.0.4rc1 even starting is extremely painful.
>
> I tried to manually add features to the system-log-service and the thing that causes everything to slow down drastically is the addition of the #:kernel-log-file (even if it is empty).
>
> Setting that to #f instead makes the test now fail considerable faster (1.8s rather than 900).
>
>
>
I think the issue is that shepherd constantly reads the eof object from the input port (i guess the port is always readable to fibers but with nothing there). Trying to add a print to that case leads to similar unresponsiveness on linux as well.
So my theory is that that fibers is struggling to suspend the syslog loop.
When i added a (sleep 1) to "wait-for-input-or-message" the issue went away.

I am actually impressed that this seems to be less of a problem on linux.
I then tried to fix the test by adjusting the first case of the tests message destination to 

(and=> (system-log-message-sender message)
              (lambda (addr) (not (= AF_UNIX (sockaddr:fam addr)))))

After adjusting some of the sleep times in the test it is now passing, but ofc this is not really a fix.
Some minor other things:
-  Currently the NULL byte from the sender is added into the log message, might be worth filtering out
- "herd status syslogd" does not show the endpoints and kernel-log-file the syslog is reading from
- the shepherd package from the shepherd channel in 'set-fibers-directory sets fibers ccache dir also to the fibers source dir
- I am unable to cross compile shepehrd from git with guix because of help2man
- the guix shepherd-syslog service currently does not expose setting the endpoints and kernel-log-file (currently on my childhurd this would be /dev/klog instead of /dev/kmsg)
Ludovic Courtès wrote 11 months ago
(address . yelninei@tutamail.com)(name . 77634)(address . 77634@debbugs.gnu.org)
878qnvg9zb.fsf@gnu.org
Hi again,

yelninei@tutamail.com writes:

Toggle quote (13 lines)
> Some minor other things:
> -  Currently the NULL byte from the sender is added into the log
> message, might be worth filtering out
> - "herd status syslogd" does not show the endpoints and
> kernel-log-file the syslog is reading from
> - the shepherd package from the shepherd channel in
> 'set-fibers-directory sets fibers ccache dir also to the fibers source
> dir
> - I am unable to cross compile shepehrd from git with guix because of help2man
> - the guix shepherd-syslog service currently does not expose setting
> the endpoints and kernel-log-file (currently on my childhurd this
> would be /dev/klog instead of /dev/kmsg)

All good points! Most of these are now fixed on the Shepherd side:

48e53a4 guix: Record the right Fibers .go file directory.
98878a7 system-log: Serialize information about sockets and the kernel log file.
be57f65 system-log: Close input file on end-of-file.
21bd4ec system-log: Print “localhost” for messages received over AF_UNIX sockets.
1ad90aa system-log: Read kernel messages from /dev/klog on the Hurd.

Regarding cross-compilation, here’s a protip:

guix build -L $PWD/.guix/modules shepherd@1.0.99-tarball \
--target=i586-pc-gnu

It works because it first builds the tarball, which includes
pre-generated man pages, and then uses it as the source of the Shepherd
being cross-compiled.

Thanks a lot for the detailed bug reports and suggestions; it’s much
appreciated!

Ludo’.
yelninei wrote 11 months ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 77634)(address . 77634@debbugs.gnu.org)
OOHsq6H--F-9@tutamail.com
Hello,

Apr 19, 2025, 20:42 by ludo@gnu.org:

Toggle quote (27 lines)
> Hi again,
>
> yelninei@tutamail.com writes:
>
>> Some minor other things:
>> -  Currently the NULL byte from the sender is added into the log
>> message, might be worth filtering out
>> - "herd status syslogd" does not show the endpoints and
>> kernel-log-file the syslog is reading from
>> - the shepherd package from the shepherd channel in
>> 'set-fibers-directory sets fibers ccache dir also to the fibers source
>> dir
>> - I am unable to cross compile shepehrd from git with guix because of help2man
>> - the guix shepherd-syslog service currently does not expose setting
>> the endpoints and kernel-log-file (currently on my childhurd this
>> would be /dev/klog instead of /dev/kmsg)
>>
>
> All good points! Most of these are now fixed on the Shepherd side:
>
> 48e53a4 guix: Record the right Fibers .go file directory.
> 98878a7 system-log: Serialize information about sockets and the kernel log file.
> be57f65 system-log: Close input file on end-of-file.
> 21bd4ec system-log: Print “localhost” for messages received over AF_UNIX sockets.
> 1ad90aa system-log: Read kernel messages from /dev/klog on the Hurd.
>

Great, thanks.

The last thing that still needs fixing is the message-destination procedure in the system-log.sh test. Currently It sends all messages to the $syslog_remote_file if the message has a sender, but arguably the (null) AF_UNIX sockets are not remote.

After I changed that I was able to finally build shepherd with all non skipped tests passing.

I then tried the shepherd syslogd on the childhurd but some things were not quite right:
- the /dev/klog messages are being logged but were missing the %kernel-prefix, i dont think this is a  problem with shepherd.
- I saw no syslog messages in /var/log/messages, i.e. from sshd or a user shepherd logging to syslog. It just seems to stop logging to the main log file entirely after finishing with /dev/klog.
Any ideas?

Toggle quote (9 lines)
> Regarding cross-compilation, here’s a protip:
>
> guix build -L $PWD/.guix/modules shepherd@1.0.99-tarball \
> --target=i586-pc-gnu
>
> It works because it first builds the tarball, which includes
> pre-generated man pages, and then uses it as the source of the Shepherd
> being cross-compiled.
>
This even works in the childhurd operating system record and I can just use that package instead of the manual "make dist" and  source override I did before.

Toggle quote (4 lines)
> Thanks a lot for the detailed bug reports and suggestions; it’s much
> appreciated!
>

Thank you for making shepherd very approachable. I had no issues understanding how the syslog service is supposed to work and hunt down the bug.


Toggle quote (2 lines)
> Ludo’.
>
Ludovic Courtès wrote 10 months ago
(address . yelninei@tutamail.com)(name . 77634)(address . 77634@debbugs.gnu.org)
87v7qun93n.fsf@gnu.org
Hey,

yelninei@tutamail.com writes:

Toggle quote (5 lines)
> The last thing that still needs fixing is the message-destination
> procedure in the system-log.sh test. Currently It sends all messages
> to the $syslog_remote_file if the message has a sender, but arguably
> the (null) AF_UNIX sockets are not remote.

Oh right. I think 2d70e2f3948910ab97b9d90d75aa0196596fa917 should fix
it.

Toggle quote (8 lines)
> I then tried the shepherd syslogd on the childhurd but some things were not quite right:
> - the /dev/klog messages are being logged but were missing the
> %kernel-prefix, i dont think this is a  problem with shepherd.
> - I saw no syslog messages in /var/log/messages, i.e. from sshd or a
> user shepherd logging to syslog. It just seems to stop logging to the
> main log file entirely after finishing with /dev/klog.
> Any ideas?

Yeah, I think that’s because of the streamio bug:


Toggle quote (4 lines)
> Thank you for making shepherd very approachable. I had no issues
> understanding how the syslog service is supposed to work and hunt down
> the bug.

Thanks for the kind words. I very much hope if feels approachable to as
many people as possible.

Ludo’.
yelninei wrote 10 months ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 77634)(address . 77634@debbugs.gnu.org)
OOcMOqN--F-9@tutamail.com
Hello

Apr 23, 2025, 20:58 by ludo@gnu.org:

Toggle quote (12 lines)
> Hey,
>
> yelninei@tutamail.com writes:
>
>> The last thing that still needs fixing is the message-destination
>> procedure in the system-log.sh test. Currently It sends all messages
>> to the $syslog_remote_file if the message has a sender, but arguably
>> the (null) AF_UNIX sockets are not remote.
>>
>
> Oh right. I think 2d70e2f3948910ab97b9d90d75aa0196596fa917 should fix
> it.
Hurray:
git rev-parse HEAD
2d70e2f3948910ab97b9d90d75aa0196596fa917

guix --version
guix (GNU Guix) 9f4bdb270e178eac636df3f4bbffbe4aa3b90a67

[this is including the automake hack I mentioned earlier]guix build -f guix.scm -s i586-gnu --without-tests=guile-fibers --no-grafts/gnu/store/4fjq66pr19q9p4ki49fp1d2ddv5g1809-shepherd-1.0.99-git

Can this be closed now as the streamio and the systemd activation issue are not shepherd related?

Thank you.
Ludovic Courtès wrote 10 months ago
(address . yelninei@tutamail.com)(name . 77634)(address . 77634-done@debbugs.gnu.org)
877c39jo9b.fsf@gnu.org
Hi,

yelninei@tutamail.com writes:

Toggle quote (2 lines)
> [this is including the automake hack I mentioned earlier]guix build -f

Would be nice to fix that Automake issue, sometime…

Toggle quote (2 lines)
> Can this be closed now as the streamio and the systemd activation issue are not shepherd related?

I think so. Closing! 🎉

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 77634@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 77634
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help