SnapRAID's built-in reports are great for humans but tricky to scrape. This collector wraps the native snapraid CLI, parses the important bits from smart, scrub, and sync, and emits Prometheus-compatible metrics via the node-exporter textfile collector.
SnapRAID ── smart/scrub/sync ──┐
│ stdout + exit codes
▼
snapraid_metrics_collector.sh
│ Prometheus text exposition
▼
node_exporter textfile collector directory
│ scrape
▼
Prometheus ➜ Grafana
You can find this dashboard here
- Opinionated CLI with per-command defaults, dry-run mode, and pluggable
snapraidbinary. - Deterministic logging: timestamped run logs in
logs/plussmart.log/scrub.log/sync.logsymlinks. - Optional live verbosity and label redaction for serial numbers/devices.
- Durable metrics buffer so failed commands still ship status/error gauges.
- Extended insight: runtime durations, byte-accurate throughput, per-operation summary counters, SMART warning totals.
- Batteries-included tooling: reproducible test fixtures, Makefile helpers, GitHub Actions CI, sample systemd unit, and container image.
git clone https://github.com/your-org/snapraid-collector.git
cd snapraid-collector
make test # optional but recommended
sudo make install # installs into /usr/local/bin by default
sudo /usr/local/bin/snapraid_metrics_collector.sh \
--textfile /var/lib/node_exporter/textfile_collector/snapraid.prom \
smart scrub syncThe script must run as root (SnapRAID requirement). Use --dry-run or SNAPRAID_COLLECTOR_SKIP_ROOT=true during local experimentation/tests to bypass the root guard.
sudo ./snapraid_metrics_collector.sh [options] command [args] [command [args] ...]Commands: smart, scrub, sync. Arguments following a command are passed through to SnapRAID until the next command token. Use -- in between to force literal command names into SnapRAID.
| Option | Purpose |
|---|---|
--textfile PATH |
Atomically write metrics to PATH (promotes textfile collector friendly output). |
--snapraid-bin PATH |
Point at an alternate snapraid binary (e.g. container bind mounts). |
--log-dir PATH |
Store logs under PATH (defaults to ./logs). Symlinks smart.log, scrub.log, sync.log always point to the latest run. |
--dry-run |
Skip executing SnapRAID and still exercise parsing/metrics (disables root requirement). |
--verbose |
Stream SnapRAID stdout/stderr to collector stderr in real-time. |
--redact-identifiers |
Hash disk, serial, and device labels before exporting them as Prometheus labels. |
--smart-defaults "ARGS" |
Default flags prepended to snapraid smart invocations. |
--scrub-defaults "ARGS" |
Default flags prepended to snapraid scrub. (-p 10 if unspecified.) |
--sync-defaults "ARGS" |
Default flags prepended to snapraid sync. |
Environment overrides:
SNAPRAID_BIN,SNAPRAID_LOG_DIR,SNAPRAID_SMART_DEFAULTS,SNAPRAID_SCRUB_DEFAULTS,SNAPRAID_SYNC_DEFAULTSSNAPRAID_COLLECTOR_REDACTto force identifier masking.SNAPRAID_COLLECTOR_SKIP_ROOTfor automated tests (true/1).
# Run SMART with verbose logs redirected to stderr
sudo ./snapraid_metrics_collector.sh --verbose smart
# Custom log directory and snapraid binary
sudo ./snapraid_metrics_collector.sh \
--log-dir /var/log/snapraid-collector \
--snapraid-bin /opt/snapraid/bin/snapraid \
smart sync
# Chain scrub + sync with bespoke defaults and explicit pass-through
sudo ./snapraid_metrics_collector.sh \
--scrub-defaults "-p 20 -a" \
--textfile /tmp/snapraid.prom \
scrub sync -- --force-zero
# Dry-run parsing with redacted identifiers from captured output (dev only)
SNAPRAID_COLLECTOR_SKIP_ROOT=true \
./snapraid_metrics_collector.sh --dry-run --redact-identifiers smart- Logs live under
logs/by default with per-run timestamps:logs/sync-20240101T013000.log. - Symlinks
smart.log,scrub.log,sync.logalways target the most recent run;logs/<command>.latest.logmirrors the same for convenience. --verbosemirrors the exact snapshots in real time to stderr (helpful when running ad-hoc).- Metrics are buffered before being flushed to the textfile path so even error exits export status/duration/error counters.
- Redaction hashes values deterministically (sha256 truncated to 12 chars) to preserve uniqueness without leaking serials.
Common metrics emitted per command (smart, scrub, sync):
snapraid_<cmd>_exit_status– last process exit code.snapraid_<cmd>_last_ran– millisecond epoch timestamp.snapraid_<cmd>_duration_seconds– wall-clock duration for the collector invocation.snapraid_<cmd>_file_errors,_io_errors,_data_errors– parsed from SnapRAID summaries.snapraid_<cmd>_completion_percent,_accessed_bytes,_completion_duration_seconds– normalized units, even for fractional output.snapraid_<cmd>_scan_time_seconds{disk="..."}– per disk scan timing.snapraid_<cmd>_items_{updated,removed,added,copied,restored,scrubbed,verified}– summary counters when SnapRAID prints them.
SMART-specific extras:
snapraid_smart_disk_temperature,_power_on_days,_error_count,_fail_probability(per disk labelsdisk,device,serial,size).snapraid_smart_total_fail_probability– aggregate failure probability (%).snapraid_smart_warning_count– case-insensitive count of warnings/alerts in the recent output.
Scrub & sync share the access/error/summary metrics above, giving you coverage for routine health checks and heavy sync churn.
Tip: Metrics are emitted even when SnapRAID returns non-zero, so alerting can fire on both
exit_statusand the underlying counters.
sudo install -m 0755 snapraid_metrics_collector.sh /usr/local/bin/
sudo install -d /var/lib/node_exporter/textfile_collector
sudo /usr/local/bin/snapraid_metrics_collector.sh \
--textfile /var/lib/node_exporter/textfile_collector/snapraid.prom \
smart scrub syncAdd the command to cron or systemd timers to suit your cadence.
make lint # shellcheck + shfmt (if available)
make test # runs fixture-backed parsing tests
sudo make install
sudo make install-systemdPREFIX/BINDIR/SYSTEMD_DIR can be overridden, e.g. make PREFIX=/opt install. make uninstall removes the installed script and sample unit.
The provided unit (contrib/snapraid-metrics-collector.service) runs the collector as a oneshot service:
sudo make install install-systemd
sudo systemctl daemon-reload
sudo systemctl enable --now snapraid-metrics-collector.serviceEdit /etc/systemd/system/snapraid-metrics-collector.service (or drop an override) to customise textfile paths, add EnvironmentFile=/etc/default/snapraid-collector, or tweak SnapRAID arguments.
Build the lightweight container for cron-style runs:
docker build -t snapraid-collector .
docker run --rm \
-v /etc/snapraid.conf:/etc/snapraid.conf:ro \
-v /var/lib/node_exporter/textfile_collector:/exporter \
-v /srv/snapraid-data:/srv/snapraid-data \
snapraid-collector \
--textfile /exporter/snapraid.prom smart scrub syncMount the host SnapRAID binary with -v /usr/bin/snapraid:/usr/bin/snapraid:ro if it is not already present in the image.
Node Exporter must collect textfiles from the directory used above:
# /etc/systemd/system/node_exporter.service (excerpt)
ExecStart=/usr/local/bin/node_exporter \
--collector.textfile.directory=/var/lib/node_exporter/textfile_collectorReload systemd (sudo systemctl daemon-reload && sudo systemctl restart node_exporter) after edits.
Example alerting rules:
- name: SnapRAID Alerts
rules:
- alert: SnapRAIDDiskFailureProbability
expr: snapraid_smart_disk_fail_probability > 15
for: 5m
labels:
severity: warning
annotations:
summary: "Disk failure probability {{ $labels.disk }} is {{ $value }}%"
- alert: SnapRAIDRunFailed
expr: snapraid_sync_exit_status != 0
for: 5m
labels:
severity: critical
annotations:
summary: "snapraid sync failed on {{ $labels.instance }}"
description: "Inspect snapraid_sync_exit_status and associated logs."Import the Grafana dashboard (grafana.png) or roll your own against the expanded metric set.
<<<<<<< HEAD
The script logs each SnapRAID command to a serperate file in the same directory a the script in smart.log, scrub.log, and sync.log files.
The repository ships with canned fixtures so parsing logic stays predictable:
make test # exercises smart/scrub/sync parsing end-to-end
make lint # shellcheck + shfmt (no-op if tools missing)tests/run.shdrives the collector againsttests/bin/fake_snapraid.- CI (
.github/workflows/ci.yml) installs shellcheck/shfmt and executesmake lint test. - Override
SNAPRAID_COLLECTOR_SKIP_ROOT=truelocally to bypass the root requirement while using fixtures.
- No metrics emitted: confirm the textfile path is writable by root and Node Exporter points at the same directory.
- Empty disk labels with
--redact-identifiers: redaction hashes only disk/device/serial values; ensure SnapRAID output includes them and that you're not redacting twice via environment overrides. - Large textfile diffs: SnapRAID emits carriage returns during progress updates. The collector strips them before parsing; if you notice odd counters open an issue with sample output.
- CI complaining about shfmt: install
shfmtlocally or runmake lintinside the provided container to reproduce.
- Native HTTP exporter mode to remove the node-exporter dependency.
- Optional persistent state cache to diff consecutive runs more efficiently.
- Alternate implementation in Go/Python for long-running daemon scenarios.
Contributions, bug reports, and sample SnapRAID outputs that break the parser are always welcome.
178acb3 (add project infrastructure: CI, Docker, Makefile, docs)
