Benchmarking Network Speed on Linux Systems

Measuring network throughput and latency in Linux is a crucial troubleshooting skill for any administrator. As our reliance on the internet continues growing exponentially year over year, even minor hiccups can cause costly outages. By mastering basic network speed tests, you gain visibility into potentially critical performance bottlenecks.

This comprehensive 3200+ word guide equips you with practical methods for quantifying link quality. We first cover essential network performance terminology. Next, we survey popular Linux tools for stress testing bandwidth and visualizing traffic. Detailed examples demonstrate optimizing transfers and diagnosing connectivity issues. Custom illustrations enhance complex conceptual discussions.

You will learn an evidence-driven methodology for:

Baselining expected network capability
Continuous traffic monitoring to identify bottlenecks
Tuning Linux host configurations for maximum throughput
Pinpointing multilayer performance deficiencies

Equipped with this holistic toolkit, you can decisively troubleshoot even the most stubborn connectivity lags.

Key Network Speed Metrics

While often used interchangeably in casual conversation, the concepts of bandwidth, throughput, and latency are quite distinct in networking:

Bandwidth describes the maximum carrying capacity of a link, essentially its peak speed. This hard limit is set by the technology standard. For example, Gigabit Ethernet ports support up to 1 Gbps transfers. However, various overheads mean actual throughput is lower.

Throughput measures how much data successfully arrives at its destination over time. So while your network may advertise a fast connection, real-world transfers reach only a fraction of total bandwidth. Heavy congestion on intermediate paths further restricts effective throughput.

Latency represents delay — specifically the round trip time for a packet to reach its destination and receive acknowledgement back. Latency is measured in milliseconds and depends largely on physical distance and processing times across routing hops. Unlike bandwidth limits where higher is better, lower latency enables more responsive communications.

Understanding these core metrics empowers you to accurately diagnose network degradations. Is poor performance due to:

Insufficient bandwidth ceilings?
Traffic congestion decreasing throughput?
Latency lags extending round trip times?

Often issues arise from a combination of factors. Now let‘s explore Linux tools to quantify each element.

Iperf3 – Traffic Generation for Stress Testing

Iperf3 is the defacto standard for measuring network throughput in Linux environments. It works by establishing TCP and UDP data streams between hosts and reporting bandwidth, latency, jitter, and packet loss. Tests are customizable in length, size of datagrams sent, and various other parameters.

Think of iperf3 as a water faucet – it controls a stream of network traffic to characterize piping capacity between two endpoints. Run tests under different configurations to baseline expected performance needs.

To install on Debian-based distributions:

sudo apt update
sudo apt install iperf3

Iperf3 operates in client/server mode. First launch it on the server with the -s flag to listen for incoming streams:

iperf3 -s

Iperf3 server output

The key pieces of information from server output are the TCP and UDP port numbers. These specify where the tool is awaiting incoming test traffic.

Next invoke iperf3 in client mode on a remote host, targeting the server IP and ports:

iperf3 -c server_ip -p tcp_port -u udp_port

This initiates a 10 second TCP test and UDP test towards the listening server. We can customize duration and parameters like packet size as well:

iperf3 -c server_ip -p tcp_port -u udp_port -t 30 -l 1400

Iperf3 client output

The client prints extensive details on the connection quality including:

Bandwidth in Mbit/sec
Packet loss percentage
Datagrams out of order
Latency metrics like mean, standard deviation, jitter

For example we can see that TCP throughput reached 941 Mbps with 0.51 ms average latency. UDP peaked around 938 Mbps indicating nearly equivalent line quality.

Experiment with tuning number of parallel streams, durations, packet sizes and buffers to baseline expected network capability between hosts. Run tests reflecting production application profiles – is traffic primarily large file transfers or interactive sessions?

Also try setting up benchmark streams across staged environments to quantify impacts of infrastructure changes. If the network takes an unexpected performance hit after a firewall upgrade, restore from known good configurations with confidence.

Iperf3 is an invaluable tool for stress testing links and establishing performance benchmarks. Next let‘s explore utilities that provide continuous traffic monitoring.

Nload – Visually Monitoring Bandwidth Use

While iperf3 actively generates test traffic, Nload passively monitors production network usage in real-time. Think of it like a continuously updating version of iftop focused specifically on bandwidth.

Installation is again quite straight-forward on apt-based Linux distributions:

sudo apt update  
sudo apt install nload

Invoke nload on the command line without any arguments:

nload

Nload full interface output

You‘ll see it immediately begin charting total bandwidth utilization across all available network interfaces.

The top graph depicts usage over time while the bottom breaks out inbound and outbound traffic by adapter. So at a glance we can visually identify both peak bandwidth usage as well as any interfaces disproportionately handling load.

Hit ‘d‘ to toggle display of individual hosts that make up charted traffic. For example here we see that 10.0.0.20 and 10.0.0.10 are the primary consumers saturating links:

Nload output showing top hosts

Nload renders directly in the console so it‘s perfect for quick diagnostics. The visualization makes it fast and intuitive to spot check for bottlenecks during performance troubleshooting.

For dedicated monitoring though, longer term logging and alerting capabilities are often necessary as well. This is where time-series tools like Prometheus excel.

Prometheus Node Exporter – Long-Term Metrics Storage

Prometheus has emerged as the de facto standard for monitoring in Linux environments. It reliably aggregates system-level metrics, stores time-series data efficiently, and integrates well with alerting utilities.

Prometheus Node Exporter is a purpose-built package for collecting host-level operating statistics. Once installed on servers, Prometheus can actively scrape exposed endpoints to harvest measurements.

I won‘t dive into full Prometheus stack installation as that entails deploying the main aggregation server, configuring scraper jobs, expanding storage capacity via remote read/write endpoints etc. Please reference the Prometheus documentation to design an appropriately resilient monitoring architecture.

But I will focus specifically on enabling Prometheus Node Exporter on individual Linux hosts. This exposes a metrics ingestion web interface allowing that server data to be centrally harvested.

Installation is quite simple via package managers:

sudo apt update
sudo apt install prometheus-node-exporter

This starts the web service on port 9100:

curl http://localhost:9100/metrics

You‘ll see an absolute wealth of operating statistics now available for scraping including CPU, memory, disk, and crucially network measurements. Specifically pay attention to the node_network_* metrics capturing real-time bandwidth throughput, packets, and error rates per NIC.

Now Prometheus can be pointed at host IP addresses on port 9100 to centrally aggregate these essential performance metrics. Graphing historical trends enables you to establish baselines and confidently assess infrastructure changes.

For example here we visualize interface throughput over a 30 day period:

Prometheus network utilization graph

This chart shows total monthly traffic broken out by receive and transmit directionality. We can instantly identify patterns such as recurring peaks every 7 days potentially related to weekend business processes. Having this rich long-term context aids enormously in optimization and capacity planning.

Prometheus paired with intelligent alert rule definitions enables you to get ahead of issues before customers even notice degradations. For example trigger an event if minibatch traffic spikes outside baseline norms for an interface.

Now let‘s shift gears to applying these monitors in some common troubleshooting scenarios.

Common Network Speed Diagnostics

We‘ve covered techniques to actively benchmark network capability and monitor real-time traffic. How can those methodologies can be applied to practical troubleshooting scenarios?

Slow Transfer Speeds Between Sites

Customers complain of degraded application performance between headquarters and a satellite office. Where do we start diagnosing?

Profile LAN speeds first: Use iperf3 and nload to validate that clients experience expected local throughput when copying files between systems on same site. Eliminates potential hosting infrastructure issues upfront.
Quantify WAN capability: Establish an iperf3 profile reflecting production traffic patterns between sites. If achieved bandwidth matches advertised ISP links during controlled tests, investigate workstations specifically next.
Inspect live application traffic: Install nload on affected client workstations to visualize production application network utilization. Check for abnormal bandwidth ceilings or loss across multiple protocols.
Packet capture analysis: Use tcpdump or Wireshark during reproduction to inspect traffic characteristics – high latency, loss, errors? Performs micro-level analysis to pinpoint network stack shortfalls.
Historical benchmarking: Trend interface throughput long-term with Prometheus. Compare recent degraded periods to past baselines. Helps identify correlation of issues with changes.

We deliberately profile bandwidth step-by-step from end client experience towards network core links. Measure along path to identify convergence points where slowness introduces. Each tool fills gaps the others lacks in terms of metrics visualization, traffic generators, micro-inspection, historical records etc.

While network issues can appear opaque, methodically instrumenting with proven toolchains reliably surfaces the root cause.

Web Application Latency Spikes

Users are complaining of intermittent latency accessing your business web application. How to troubleshoot?

First question – does the app server itself show any resource constraints during periods of lag? System metrics may highlight an obvious bottleneck like memory swapping, elevated CPU throttling, maxed disk I/O etc. Resolve obvious shortfalls.
Otherwise inspect the network path:
- Utilize ping and traceroute to baseline connectivity to app during normal operations. This quantifies standard routing topology and round trip times.
- Ping continuously from multiple vantage points during reported periods of latency. Check if specific nodes show lagging responses.
- Run iperf3 tests from affected clients targeting the web application ports. Check if bandwidth caps or loss emerge over time.
- Check switches, routers, and firewalls along network path for rule changes, bursts in processing latency, micro-reboots. Any intermediary could potentially introduce lags.
If network infrastructure shows no clear culprits, perform packet analysis using tcpdump:
- During reported latency, capture app traffic on the affected client systems while reproducing the issue.
- Analyze the capture using Wireshark to inspect for any abnormal protocol behavior, stalled data flows, repeating retransmissions indicating loss etc.
- May require tracing DNS, application-layer and transport level exchanges to pinpoint flakey response patterns.Extract hard metrics wherever possible.
Finally monitor your application with an end user experience tool like ThousandEyes or Datadog APM during periods of slowness. The external viewpoint quantifies precisely where lag manifests across network infrastructure tiers when accessing the app UI or APIs.

While specifics vary, the prescribed network diagnostics practice remains consistent – establish baseline health, actively test during events, continuously monitor with metrics, inspect packets for microscopic anomalies, and validate externally. Each lens contributes insights that others lack.

Tuning Linux for Speed

Beyond troubleshooting, optimizing network throughput involves tuning across all levels of the operating system and application stack. Hyperparameter adjustments may provide substantial speedups for file transfers and interactive use cases alike:

Configuration	Potential Optimization	Risks
NIC interrupt handling	Bind adapter interrupts to dedicated CPUs for increased packet processing efficiency	Insufficient cores may constrain configurations
MTU size	Confirm MTU sufficiently large for network links, incrementally test gains up to 9000 bytes	Traffic blackholing if mismatch between endpoints
NIC offloads	Enable TCP segmentation, VLAN tagging, RX/TX checksum offloading to reduce server overhead	Compatibility issues on some adapters
Kernel sysctls	Increase Linux autotuning receive/send buffer limits for fat pipes	Memory overcommit impacting workloads
Congestion control	Set TCP algorithm like BBR to aggressively measure pipe capacity	Bufferbloat and jitter concerns
Network stack	Bypass kernel and leverage user-space stacks like DPDK for packet I/O optimizations	Stability risks and administrative overhead
Link aggregation	Distribute connectivity over multiple interfaces like LACP bonding	Throughput imbalances between member links

Each optimization targets different bottlenecks from insufficient buffers to mismatched TCP windows. Test incrementally with iperf3 rather than piling on changes. Enhancements should align with production application traffic profiles and use cases – optimize for sustained large transfers versus interactive sessions for example.

There exists an enormous range of low level tuning knobs. Profile with tools like systemtap to trace packet walks under the hood. Optionally bypass kernel network layers entirely for user space stacks and kernel bypass like DPDK. Measure often while validating improvements using previous baselines. Evidence-drive optimization is key.

Conclusion

Network speed measurement serves many invaluable purposes:

Controlled diagnostics during troubleshooting
Validating infrastructure changes
Disaster recovery planning
Ongoing performance optimization

Approach profiling holistically across the full application delivery chain – endpoints, traffic patterns, network transport, routing topology, and physical NIC handling. Quantify bandwidth, throughput, latency, error rates and buffering at each point.

Choose tools like iperf3, nload, and the Prometheus ecosystem to provide well-rounded visibility. Apply them in conjunction rather than isolation for multifaceted insights.

By mastering this versatile troubleshooting toolkit, connectivity bottlenecks morph into highly visible outliers rather than obscure black boxes. Your Linux network stack will thank you!

Benchmarking Network Speed on Linux Systems

Key Network Speed Metrics

Iperf3 – Traffic Generation for Stress Testing

Nload – Visually Monitoring Bandwidth Use

Prometheus Node Exporter – Long-Term Metrics Storage

Common Network Speed Diagnostics

Slow Transfer Speeds Between Sites

Web Application Latency Spikes

Tuning Linux for Speed

Conclusion

How to Download and Install Windows 10 LTSC

Pandas Drop Index – A Comprehensive Guide

Mastering SQL Order by Count: A Comprehensive Guide for Developers

Mastering Pandas to_datetime for Effective Data Analysis

Mastering Aggregate Functions in SQLite: An Expert‘s Handbook

How to Find Variable Type in JavaScript: A Comprehensive Guide for Developers

Linuxhaxor.net – About Open Source & Linux

Key Network Speed Metrics

Iperf3 – Traffic Generation for Stress Testing

Nload – Visually Monitoring Bandwidth Use

Prometheus Node Exporter – Long-Term Metrics Storage

Common Network Speed Diagnostics

Slow Transfer Speeds Between Sites

Web Application Latency Spikes

Tuning Linux for Speed

Conclusion

Related posts:

Similar Posts

Linuxhaxor.net – About Open Source & Linux