Skip to content

Endless loop when first started with empty filesystem (0.6.1) #93

@charles-dyfis-net

Description

@charles-dyfis-net

This was discovered when writing a NixOS system test for the bees module.

The testing logic in https://github.com/NixOS/nixpkgs/blob/923a3e4970226293e4698e44e3e5d5ccf7487603/nixos/tests/bees.nix consistently succeeds every time: This code first creates files on a new filesystem, then starts the bees service. That (passing) test is roughly equivalent to the following shell script:

any_shared_space() {
  [[ $(btrfs fi du -s --raw "$@" | awk 'NR>1 { print $3 }' | grep -E '^0$' | wc -l) -eq 0 ]]
}
die() { echo "$*" >&2; exit 1; }

mkfs.btrfs -f -L aux /dev/vdb || die
mount /dev/vdb /home || die
dd if=/dev/urandom of=/home/dedup-me-1 bs=1M count=8 || die
cp --reflink=never /home/dedup-me-1 /home/dedup-me-2 || die
any_shared_space /home/dedup-me-* && die "ERROR: Detecting shared space before any deduplication has been done"
sync
systemctl start beesd@aux.service
sleep 10
any_shared_space /home/dedup-me-* || die "ERROR: No shared space detected even after bees is running"

By contrast, a test akin to the following -- which starts the service after the filesystem is created and initially mounted, but before any content has been created -- consistently fails, with bees running in a loop which is trying to poll the status of a file descriptor referring to a file that doesn't exist:

any_shared_space() {
  [[ $(btrfs fi du -s --raw "$@" | awk 'NR>1 { print $3 }' | grep -E '^0$' | wc -l) -eq 0 ]]
}
die() { echo "$*" >&2; exit 1; }

mkfs.btrfs -f -L aux /dev/vdb || die
mount /dev/vdb /home || die
systemctl start beesd@aux.service
sleep 1
dd if=/dev/urandom of=/home/dedup-me-1 bs=1M count=8 || die
cp --reflink=never /home/dedup-me-1 /home/dedup-me-2 || die
sync
sleep 10
any_shared_space /home/dedup-me-* || die "ERROR: No shared space detected even after bees is running"

The actual failing test can be found at https://github.com/charles-dyfis-net/nixpkgs/blob/bees-test-failing/nixos/tests/bees.nix; if checking out the relevant branch of nixpkgs, it can be run (from the root of that working tree) with nix-build -I nixpkgs="$PWD" ./nixos/tests/bees.nix.

strace of the loop taking place when in that failed state can be seen at https://gist.github.com/charles-dyfis-net/34ac2e4d2bada0c3a3c8632cab98c8d9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions