As an experienced Linux systems engineer, the SAR utility has become an indispensable part of my admin toolbox for monitoring overall system health and performance. With granular visibility into everything from CPU usage to paging frequencies, SAR provides the quantitative dataPoints that serve as the foundation for making informed optimizations.
In this comprehensive guide, I‘ll share the key SAR techniques and integrations that I rely on to keep complex systems humming. Whether you‘re looking to graduate from SAR basics or gain additional perspective from a seasoned Linux professional, read on!
A SAR Primer
Before diving into advanced functionality, let‘s quickly review SAR basics for those unfamiliar.
SAR stands for System Activity Reporter. It‘s included by default in the sysstat package. To start collecting data:
# Install sysstat
sudo apt install sysstat
# Enable background data collection
sudo vi /etc/cron.d/sysstat
ENABLED="true"
With collection enabled, the sadc utility silently gathers critical system metrics at defined intervals (/var/log/sa/ by default).
To generate reports:
sar [options] [interval] [count]
For example:
# CPU usage breakdown over 1 hour
sar -u 3600
# Memory & swap usage over 10 minutes
sar -r 600 2
With over 30 report types available, SAR offers deep insights into all aspects of system and resource utilization. For full capabilities, check out man sar.
Now let‘s dig deeper into additional functionality that unlocks SAR‘s immense potential.
Custom Data Collection Intervals
While the default 10 minute statistics collection interval is fine for high-level monitoring, many advanced use cases call for more frequent data points.
Customize the cron job in /etc/cron.d/sysstat to set your desired interval:
# Collect data every 30 seconds
30 * * * * root /usr/lib64/sa/sa1 600 6
Adjust this based on your needs and available storage for the increased quantity of logs. Just remember that reducing the interval increases system load from additional sadc runs.
With more frequent measurements, SAR reports become much more useful for granular analysis like:
- Correlating performance dips to traffic spikes
- Benchmarking load times for batch jobs
- Pinpointing hourly usage patterns
Separate Disk Partitions for Data
As the quantity of SAR data grows substantially over time, it can help to allocate a dedicated partition for the logs rather than filling up the root.
When creating the sysstat cron job, specify a custom output directory:
# Store SAR logs on dedicated disk
* * * * * root /usr/lib64/sa/sa1 600 6 /datastore/sarlogs
You can also move the existing logs:
mv /var/log/sa /datastore/sarlogs
ln -s /datastore/sarlogs /var/log/sa
This keeps your critical monitoring data separate from the OS filesystem and enables easier long-term retention.
Filtering SAR Reports
As systems grow in complexity, SAR reports can become verbose with unnecessary detail across hundreds of metrics. This is where filtering becomes indispensable.
Some examples:
# CPU stats for only physical core 0
sar -P 0
# Memory usage excluding buffer cache
sar -r -B
# Errors only for eth1 interface
sar -n ETH1
# I/O usage for sdb disk only
sar -d sdb
Peruse man sar for additional filter options. Crafting targeted reports is key for efficient analysis at scale.
Monitoring System Errors & Faults
In addition to utilization metrics, SAR provides invaluable insight into system reliability via error and fault monitoring.
Two key options here are:
sar -n EDEV – Report network errors like overruns, packet drops.
sar -v – File system errors from faulty disks/controllers.
Example outputs:
10:59:30 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
11:00:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:59:30 AM DEV err/s
11:00:01 AM md127 0.00
Monitoring these error counters can provide early warning of impending hardware issues. The visibility SAR provides into system faults is extremely valuable for stability and uptime.
Integrating SAR with Other Tools
While SAR provides unparalleled internal system visibility, integrating external context can offer additional actionable insights.
Some examples of complementary tools I routinely correlate with SAR:
- Nginx/Apache logs – Match performance dips to traffic
- App dashboards – Cross-reference usage metrics
- SNMP data – Aggregate metrics across devices
- Load balancer stats – Compare device utilization
An easy way to achieve this is by logging other tools to syslog and then graphing various data sources together in a tool like Grafana.
Cross-tool data correlations shine a light on the bigger picture and underlying relationships between shifting variables that SAR alone can‘t provide.
Baseline Analysis & Alert Thresholds
When tasked with optimizing system efficiency and reliability, one of the most useful SAR techniques is to establish baselines for expected utilization.
The steps here are:
- Generate SAR reports over an extended period during normal operations
- Calculate average and peak usage for critical metrics like CPU, memory, network, disk, etc
- Flag recurring outlier spikes for investigation
- Set max threshold alerts at 20% above identified peaks
Now whenever current SAR measurements breach those upper limits for sustained periods, you receive early warning of abnormal resource constraints.
Equally critical is periodically reviewing the baseline profiles themselves as needs evolve to prevent outdated assumptions.
Conclusion
I hope this guide has helped demonstrate advanced SAR strategies and integrations useful for unlocking additional value. From here, I suggest exploring man pages for further capabilities I wasn‘t able include.
The tool offers immense monitoring depth – it‘s only a matter of creatively applying it to your specific environment and challenges. Integrate, iterate, analyze!
Let me know if you have any other SAR best practices to share from your own admin experiences!


