Skip to content

fix(logging): stop losing admin-vm logs across offline reboots#1396

Merged
brianmcgillion merged 1 commit intotiiuae:mainfrom
everton-dematos:pr_log_forwarding
Sep 12, 2025
Merged

fix(logging): stop losing admin-vm logs across offline reboots#1396
brianmcgillion merged 1 commit intotiiuae:mainfrom
everton-dematos:pr_log_forwarding

Conversation

@everton-dematos
Copy link
Copy Markdown
Contributor

Description of Changes

Fixes loss of admin-vm logs when they are generated while the Internet is down and the VM reboots before connectivity returns.

The root cause was that we were persisting Alloy’s state (/var/lib/private/alloy) on admin-vm, which includes the loki.source.journal cursor. The source advanced its cursor pre-reboot even though some entries had not yet reached loki.write (where the WAL append happens). After reboot, the persisted cursor skipped those entries, so nothing was resent.

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

https://jira.tii.ae/browse/SSRCSP-7024

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. Boot a fresh ghaf image on Lenovo-X1 without internet (eth cable) connection
  2. Create user account, log in, open terminal
  3. cat /etc/common/device-id (and write down the id)
  4. ssh ghaf@admin-vm
  5. sudo logger --priority=user.info --tag=myjob "logtest0 - admin"
  6. Reboot the laptop
  7. Connect to internet, verify the connection by 'ping google.com' from terminal
  8. ssh ghaf@admin-vm
  9. sudo logger --priority=user.info --tag=myjob "logtest1 - admin"
  10. Log in to grafana https://ghaflogs.vedenemo.dev/explore
  11. Select filters: machine = device-id / host = admin-vm / Line contains: logtest
  12. Select time frame (upper right corner) to cover steps from 4 (e.g. Last 15 minutes)
  13. Hit "Run query"
  14. Grafana lists both log lines:
    logtest0 - admin logtest1 - admin

@brianmcgillion brianmcgillion self-requested a review September 10, 2025 13:56
Signed-off-by: Everton de Matos <everton.dematos@tii.ae>
@brianmcgillion brianmcgillion merged commit 188f586 into tiiuae:main Sep 12, 2025
27 of 28 checks passed
@everton-dematos everton-dematos deleted the pr_log_forwarding branch January 23, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants