logging: implement journald-based local log retention#1511
logging: implement journald-based local log retention#1511brianmcgillion merged 2 commits intotiiuae:mainfrom
Conversation
Add configurable journal retention settings to reduce resource usage. Logs are retained locally in systemd journal instead of running Loki. Signed-off-by: juliuskoskela <julius.koskela@unikie.com>
Admin-vm also generates its own logs and needs local retention. Signed-off-by: juliuskoskela <julius.koskela@unikie.com>
|
Tested on Darter Pro ( All good!
|
There was a problem hiding this comment.
I modified the MaxRetentionSec to 300 (5 min) both in client and server MaxRetentionSec=${toString (300)}
https://github.com/juliuskoskela/ghaf/blob/log-retention-pr-2/modules/common/logging/client.nix#L76
https://github.com/juliuskoskela/ghaf/blob/log-retention-pr-2/modules/common/logging/server.nix#L123
However, it seems the logs are still accessible even after the 5 min window:
[ghaf@net-vm:~]$ cat /etc/systemd/journald.conf
[Journal]
Storage=persistent
RateLimitInterval=30s
RateLimitBurst=10000
Audit=
MaxRetentionSec=300
SystemMaxUse=500M
SystemMaxFileSize=100M
Storage=persistent
[ghaf@net-vm:~]$ journalctl --list-boots
IDX BOOT ID FIRST ENTRY LAST ENTRY
0 3022cf1e481c47858d4069dfdb5c36c0 Thu 2025-10-30 09:10:47 UTC Thu 2025-10-30 09:24:34 UTC
[ghaf@net-vm:~]$ journalctl --directory=/var/log/journal/38ef38829fe94ed688be4d5b4c137aaf --no-pager -o short-iso | head -n1
2025-10-30T09:10:47+00:00 net-vm kernel: rtc_cmos 00:03: registered as rtc0
[ghaf@admin-vm:~]$ cat /etc/systemd/journald.conf
[Journal]
Storage=persistent
RateLimitInterval=30s
RateLimitBurst=10000
Audit=
MaxRetentionSec=300
SystemMaxUse=500M
SystemMaxFileSize=100M
Storage=persistent
[ghaf@admin-vm:~]$ journalctl --list-boots
IDX BOOT ID FIRST ENTRY LAST ENTRY
0 22b4559d156443a09fab3bf0de6fc269 Thu 2025-10-30 09:10:43 UTC Thu 2025-10-30 09:25:16 UTC
[ghaf@admin-vm:~]$ journalctl --directory=/var/log/journal/84a916f36de347dba1326cbf9bae595e/ --no-pager -o short-iso | head -n1
2025-10-30T09:10:43+00:00 admin-vm kernel: Linux version 6.17.3 (nixbld@localhost) (gcc (GCC) 14.3.0, GNU ld (GNU Binutils) 2.44) #1-NixOS SMP PREEMPT_DYNAMIC Wed Oct 15 10:04:23 UTC 2025
Is this an expected behavior?
Great testing, thank you. It seems like this is indeed how the journald retention works. There's some subtleties here:
So basically in our case what we could do is set up a cron job that runs when the configured retention time is reached and which then rotates and vacuums the logs. |
mbssrc
left a comment
There was a problem hiding this comment.
For the log clearing, log rotation happens on (re)boot, so the edge case of needing to vacuum could be added with a systemd timer later?
Yeah sure! |
Description of Changes
This PR implements local log retention using systemd-journald instead of running a resource-intensive Loki server in admin-vm.
Changes:
journalRetentionconfiguration options toghaf.loggingmodule:enable(default: true) - Enable/disable journald retentionmaxRetentionDays(default: 30) - Days to retain logs locallymaxDiskUsage(default: "500M") - Maximum disk space for logsMaxRetentionSec,SystemMaxUse,SystemMaxFileSize=100M, andStorage=persistentRationale:
The initial approach of running Loki server locally in admin-vm consumed too many CPU and memory resources. This implementation uses journald's built-in retention capabilities while Alloy continues to forward logs to the remote server. Alloy's WAL (Write-Ahead Log) ensures logs are synced even during network outages.
Type of Change
Related Issues / Tickets
Checklist
make-checksand it passesTesting Instructions
Applicable Targets
aarch64aarch64x86_64x86_64x86_64Installation Method
nixos-rebuild ... switchTest Steps To Verify:
On ghaf-host:
Check journald retention configuration is applied:
Expected: Should show
MaxRetentionSec=2592000,SystemMaxUse=500M,SystemMaxFileSize=100M,Storage=persistentVerify disk usage is within limits:
Expected: Total usage should be under 500M
Verify logs are retained across boots:
Expected: Should show multiple boot entries
Verify Alloy is still reading and forwarding logs:
Expected: Alloy service running, positions.yml file recently modified
Test log retention period:
Expected: Should only delete logs older than 30 days
On admin-vm:
Check journald retention configuration:
Expected: Same retention settings applied, Alloy receiving logs from clients
Verify admin-vm is forwarding to remote Loki server:
Expected: WAL directory present with recent activity