Skip to content

logging: prevent stalls on rotated journald entries#1712

Merged
brianmcgillion merged 1 commit intotiiuae:mainfrom
everton-dematos:pr_log_silence
Jan 27, 2026
Merged

logging: prevent stalls on rotated journald entries#1712
brianmcgillion merged 1 commit intotiiuae:mainfrom
everton-dematos:pr_log_silence

Conversation

@everton-dematos
Copy link
Copy Markdown
Contributor

Description of Changes

This PR addresses cases where log forwarding could stall after journald rotation, resulting in no logs being sent to Grafana.

  • Fix journal access after rotation by enforcing persistent journal directory permissions via systemd-tmpfiles and adding required SupplementaryGroups (systemd-journal, adm) to the Alloy service - as peer loki documentation https://grafana.com/docs/alloy/latest/reference/components/loki/loki.source.journal/

  • Limit replay of stale journald entries by setting max_age = "168h" on journal sources to avoid forwarding very old logs.

  • Add batching and timeout tuning (batch_size, max_backoff_period, remote_timeout) to improve resilience when the remote Loki endpoint is slow or unavailable, helping prevent pipeline stalls.

  • Introduce a short older_than = "15m" drop window for specific processing stages to keep ingestion moving under current remote Grafana/Loki constraints.

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

https://jira.tii.ae/browse/SSRCSP-7612

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. Verify that logs are being sent to Grafana
  2. This PR should solve the "20 to 30 minutes issue" where logs stopped going to Grafana after rotation - https://jira.tii.ae/browse/SSRCSP-7612

- Fix journal access after rotation (tmpfiles + alloy supplementary groups).

- Drop old log entries on admin-vm to keep ingestion moving.

Signed-off-by: Everton de Matos <everton.dematos@tii.ae>
@brianmcgillion brianmcgillion merged commit 4867c86 into tiiuae:main Jan 27, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants