Sometimes your plot is “wrong” even when your code is correct. I hit this the first time I tried to graph response times for an API: most requests were fast, a few were slow, and a tiny number were painfully slow. On a linear axis, the long tail crushed the rest of the data into a near-flat line, and the plot stopped telling the truth.
That’s the moment logarithmic axes start paying for themselves. A log scale doesn’t change your data; it changes how distances are measured on the axis. Equal spacing means equal ratios (like 10×), not equal differences (like +10). If your numbers span orders of magnitude—file sizes, frequencies, probabilities, runtimes, financial growth, sensor intensudes—log scales often turn an unreadable smear into a plot you can reason about.
In this post I’ll show the Matplotlib patterns I actually use in day-to-day work: quick pyplot calls, object-oriented control with Axes, convenient “semi-log” helpers, and the details that matter in production code—ticks, labels, gridlines, zeros, negatives, and saving figures reproducibly.
What a log axis really means (and when I reach for it)
A linear axis treats +1 the same everywhere. The gap from 1 to 2 is the same width as the gap from 101 to 102. A log axis treats ×10 the same everywhere. The gap from 1 to 10 is the same width as the gap from 10 to 100, or from 0.1 to 1.
That simple change gives you a few immediate benefits:
- Large dynamic range becomes readable. If your y-values span from 1e-3 to 1e6, a linear axis will hide nearly all structure near the bottom.
- Multiplicative trends become straight lines. Exponential growth/decay often looks linear in log space, which helps you spot regimes and compare slopes.
- Relative error is easier to interpret. Many systems behave proportionally: “twice as slow,” “10× bigger,” “half the frequency.” A log axis matches that mental model.
But I also avoid log scales in a few common situations:
- You have zeros or negatives that are meaningful and you need them displayed directly. A plain log scale can’t represent ≤ 0.
- Your audience expects additive changes. For budgets, absolute temperature changes, or linear offsets, a log axis can confuse readers.
- The data is already tightly ranged. If values are between 80 and 120, log scaling usually adds noise rather than clarity.
When I’m unsure, I do a quick “A/B” check: render the same plot linear vs log and decide which one answers the question faster.
The fastest path: plt.xscale(‘log‘) and plt.yscale(‘log‘)
If you already have a working plot and just want log scaling, the state-machine interface is the shortest path. I still use it for scratch work, one-off scripts, and quick notebook cells.
Here’s a complete runnable example with both axes set to log:
import numpy as np
import matplotlib.pyplot as plt
Data spanning several orders of magnitude
x = np.linspace(0.1, 10, 200)
y = np.exp(x)
plt.figure(figsize=(8, 4))
plt.plot(x, y, label="y = exp(x)")
Turn the axes logarithmic after plotting
plt.xscale("log")
plt.yscale("log")
plt.title("Log-log plot using plt.xscale / plt.yscale")
plt.xlabel("x (log)")
plt.ylabel("y (log)")
plt.grid(True, which="both", linewidth=0.6, alpha=0.6)
plt.legend()
plt.tight_layout()
plt.show()
A few notes from experience:
- Call order is flexible. You can set the scale before or after plotting; Matplotlib will transform the data when rendering.
- Don’t forget
which="both"for grids. On a log axis, major ticks alone can make the plot feel “empty.” Minor ticks give your eyes reference points. - Start x above zero. I used
0.1intentionally. A plain log scale can’t showx <= 0.
If you only need one axis in log space, just set one:
plt.plot(x, y)
plt.yscale("log") # y-axis only
plt.show()
That’s often enough for growth curves, latency distributions, or power spectra where x is “time/frequency” and y spans a huge range.
The maintainable approach: ax.setxscale(‘log‘) and ax.setyscale(‘log‘)
Once a plot becomes part of a codebase—report generation, a library function, an automated experiment runner—I switch to Matplotlib’s object-oriented style. It makes it easier to control subplots, share axes, set consistent styles, and pass around an Axes handle.
Same idea, but through Axes:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 10, 200)
y = np.exp(x)
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x, y, color="#1f77b4", label="y = exp(x)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.settitle("Log-log plot using ax.setxscale / ax.set_yscale")
ax.set_xlabel("x (log)")
ax.set_ylabel("y (log)")
ax.grid(True, which="both", linewidth=0.6, alpha=0.6)
ax.legend()
fig.tight_layout()
plt.show()
Where this really shines is multi-panel figures. For example: same data, different scalings, shared y-axis limits, consistent labeling.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 10, 200)
y = np.exp(x)
fig, axes = plt.subplots(1, 3, figsize=(12, 4), sharey=False)
axes[0].plot(x, y)
axes[0].set_title("Linear")
axes[0].set_xlabel("x")
axes[0].set_ylabel("y")
axes[0].grid(True, alpha=0.4)
axes[1].plot(x, y)
axes[1].set_title("Semilog-y")
axes[1].set_xlabel("x")
axes[1].set_yscale("log")
axes[1].grid(True, which="both", alpha=0.4)
axes[2].plot(x, y)
axes[2].set_title("Log-log")
axes[2].set_xlabel("x")
axes[2].set_xscale("log")
axes[2].set_yscale("log")
axes[2].grid(True, which="both", alpha=0.4)
fig.suptitle("Same data, different axis scaling")
fig.tight_layout()
plt.show()
In team settings, I recommend OO style because it reads like a structured description of the figure. It’s easier to review in a pull request, and it’s harder to accidentally apply settings to the wrong subplot.
One-call helpers: plt.loglog, plt.semilogx, plt.semilogy
Matplotlib also ships “helper” plot functions that set the scaling for you while plotting:
plt.loglog(x, y)sets both axes to log.plt.semilogx(x, y)sets x-axis to log only.plt.semilogy(x, y)sets y-axis to log only.
I treat these like convenience wrappers. They’re great in notebooks and quick scripts. In larger code, I still prefer Axes because it’s explicit.
Example: log-log in one call.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 10, 200)
y = np.exp(x)
plt.figure(figsize=(8, 4))
plt.loglog(x, y, label="y = exp(x)")
plt.title("Using plt.loglog")
plt.xlabel("x (log)")
plt.ylabel("y (log)")
plt.grid(True, which="both", alpha=0.6)
plt.legend()
plt.tight_layout()
plt.show()
Example: semilog-x for cases where x spans decades (like frequency) but y is a linear quantity.
import numpy as np
import matplotlib.pyplot as plt
frequency_hz = np.logspace(0, 5, 400) # 1 Hz to 100 kHz
amplitude = 1 / np.sqrt(1 + (frequency_hz / 1000) 2) # simple low-pass shape
plt.figure(figsize=(8, 4))
plt.semilogx(frequency_hz, amplitude)
plt.title("Semilog-x: frequency responses feel natural")
plt.xlabel("Frequency (Hz, log)")
plt.ylabel("Amplitude (linear)")
plt.grid(True, which="both", alpha=0.6)
plt.tight_layout()
plt.show()
I like this pattern because it matches how engineers and data folks already think: frequency often lives on a log axis, while the measured quantity might not.
Getting ticks, labels, and grids right (so the plot stays honest)
Log plots can mislead if you let Matplotlib choose defaults that don’t match your context. When I polish a figure for a blog post, a report, or a dashboard export, I pay attention to four things:
1) Major tick locations (the “decades”)
2) Minor ticks (subdivisions between decades)
3) Tick label formatting (plain numbers vs scientific notation)
4) Gridlines (major + minor)
Choosing a base (10 vs 2 vs e)
Default log scaling is base 10, which is what most readers expect. But base 2 is natural for memory sizes and some algorithmic complexity plots.
import numpy as np
import matplotlib.pyplot as plt
sizes_bytes = 2 np.arange(10, 31) # 1 KiB-ish to ~1 GiB-ish
throughput = 5e7 / (1 + (sizes_bytes / (2 20))) # made-up curve
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(sizes_bytes, throughput)
ax.set_xscale("log", base=2)
ax.set_title("Base-2 log axis for byte sizes")
ax.set_xlabel("Block size (bytes, log base 2)")
ax.set_ylabel("Throughput (bytes/s)")
ax.grid(True, which="both", alpha=0.6)
fig.tight_layout()
plt.show()
If you use base 2, I strongly recommend labeling it explicitly. Many people assume base 10 when they see “log.”
Using locators and formatters for clean tick labels
For publication-style plots, I often take control with LogLocator and LogFormatter.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import LogLocator, LogFormatter
x = np.logspace(-2, 3, 400) # 1e-2 to 1e3
y = 0.5 x * 2
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x, y)
ax.set_xscale("log")
ax.set_yscale("log")
Major ticks at decades, minor ticks at 2..9 within each decade
ax.xaxis.setmajorlocator(LogLocator(base=10))
ax.xaxis.setminorlocator(LogLocator(base=10, subs=np.arange(2, 10) * 0.1))
ax.yaxis.setmajorlocator(LogLocator(base=10))
ax.yaxis.setminorlocator(LogLocator(base=10, subs=np.arange(2, 10) * 0.1))
Cleaner labels for major ticks
ax.xaxis.setmajorformatter(LogFormatter(base=10))
ax.yaxis.setmajorformatter(LogFormatter(base=10))
ax.grid(True, which="major", linewidth=0.8, alpha=0.6)
ax.grid(True, which="minor", linewidth=0.4, alpha=0.3)
ax.set_title("Controlled ticks on a log-log plot")
ax.set_xlabel("x")
ax.set_ylabel("y")
fig.tight_layout()
plt.show()
Why I bother:
- Default tick placement can be fine for quick looks, but it’s not always consistent across figure sizes.
- Minor ticks help readers estimate values between decades without guessing.
- Explicit locators mean your plot looks stable when you change DPI, fonts, or export format.
A practical rule for gridlines
I keep major gridlines more visible than minor gridlines. If minor gridlines are too strong, the plot turns into graph paper and the data stops standing out.
Zeros and negatives: log scale’s sharp edges (and what I do instead)
Plain log scaling has one hard rule: values must be positive. If you pass zeros or negatives, you’ll either get warnings, missing segments, or confusing results.
In real datasets, zeros and negatives happen all the time:
- A metric that can be exactly zero (cache misses, empty payloads)
- A signed quantity (error, residuals, profit/loss)
- Data that includes sentinel values (0 meaning “not measured”)
Here are the patterns I reach for.
1) Mask invalid values (when zero means “not valid for plotting”)
If zeros are placeholders, I’d rather drop them than pretend they’re tiny positives.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0.1, 10, 200)
y = np.sin(x) * 100
Inject some zeros to simulate missing or clipped readings
y[::17] = 0
mask = y > 0 # only positive values are valid for a log y-axis
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x[mask], y[mask], marker="o", markersize=3, linewidth=1)
ax.set_yscale("log")
ax.set_title("Masking non-positive values for a log y-axis")
ax.set_xlabel("x")
ax.set_ylabel("y (positive only, log)")
ax.grid(True, which="both", alpha=0.6)
fig.tight_layout()
plt.show()
If dropping points could hide important behavior, I annotate it (“non-positive values omitted”) or choose a different scaling method.
2) Use symlog when you need to show negative and positive values
symlog (symmetric log) gives you a linear region around zero, then transitions to log scaling for larger magnitudes on both sides.
This is my go-to for signed error metrics that span from tiny to huge.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-1000, 1000, 2000)
y = x3 / 1e6 # wide range, includes negatives
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x, y)
linthresh controls the half-width of the linear region around 0
ax.set_yscale("symlog", linthresh=1)
ax.set_title("symlog: signed values with a log-like feel")
ax.set_xlabel("x")
ax.set_ylabel("y (symlog)")
ax.grid(True, which="both", alpha=0.6)
fig.tight_layout()
plt.show()
What I recommend:
- Pick
linthreshbased on domain meaning (noise floor, measurement resolution, or “small enough to be treated as ~0”). - Label the axis as
symlogor explain it in the caption; otherwise readers can misread distances near zero.
3) Use logit for probabilities (0..1) when that’s the right story
If you’re plotting probabilities, rates, or proportions, a log axis isn’t always the best fit. Matplotlib also has a logit scale (useful in calibration work).
I only use it when the audience expects it (statistics/ML calibration). If not, I stick to linear and focus on thoughtful limits.
A realistic workflow example: plotting latency percentiles on a log y-axis
A common production plot is request latency over time, plus percentiles. Latency often spans from sub-millisecond up to seconds. A log y-axis makes percentiles readable together.
Here’s a runnable example that simulates data and plots several percentiles:
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(7)
minutes = np.arange(0, 180)
Simulated p50, p90, p99 latencies in milliseconds
p50 = 5 + rng.lognormal(mean=0.0, sigma=0.25, size=minutes.size)
p90 = 12 + rng.lognormal(mean=0.2, sigma=0.35, size=minutes.size)
Make p99 occasionally spike (incident-like behavior)
spikes = rng.choice([0, 1], size=minutes.size, p=[0.95, 0.05])
p99 = 25 + rng.lognormal(mean=0.5, sigma=0.5, size=minutes.size) + spikes * rng.uniform(200, 1500, size=minutes.size)
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(minutes, p50, label="p50", linewidth=1.4)
ax.plot(minutes, p90, label="p90", linewidth=1.4)
ax.plot(minutes, p99, label="p99", linewidth=1.4)
ax.set_yscale("log")
ax.set_title("Service latency percentiles (log y-axis)")
ax.set_xlabel("Time (minutes)")
ax.set_ylabel("Latency (ms, log)")
ax.grid(True, which="both", alpha=0.5)
ax.legend(ncol=3)
Helpful limits: avoid starting at 0 on a log axis
ax.set_ylim(1, 3000)
fig.tight_layout()
plt.show()
Why this works well:
- You can see “normal” behavior (p50/p90) and incident spikes (p99) on the same chart.
- Relative changes pop out. If p99 doubles, the movement is visually consistent across the scale.
One caveat: if stakeholders are not comfortable reading logs, I’ll add light annotations (“each grid step is 10×”) or provide a second linear plot in an appendix.
Common mistakes I see in code reviews (and how I fix them)
These come up a lot when log scaling gets added late in the process.
Mistake 1: Plotting data that includes zeros without handling it
Symptom: warnings like “Data has no positive values” or missing line segments.
Fix: mask non-positive values, use symlog, or rethink the display.
Mistake 2: Forgetting axis labels that reveal scaling
If the label says “Latency (ms)” and you silently switch to log, readers can misread slopes and differences.
Fix: I label it as “Latency (ms, log)” or specify “log10” if base matters.
Mistake 3: Using log scaling to make two lines “look closer”
Log axes compress large values. If you’re trying to compare absolute differences, log scaling can hide real costs.
Fix: decide what question the plot answers.
- If the question is “how many times bigger?”, log is fine.
- If the question is “how many milliseconds did we add?”, stick to linear.
Mistake 4: Too much visual noise from minor ticks
Minor ticks and grids are useful, but heavy minor gridlines can overwhelm the data.
Fix: make minor gridlines faint and thin. Major gridlines should do most of the work.
Mistake 5: Mixing plt.* state changes with multiple subplots
In larger scripts, a stray plt.yscale(‘log‘) can affect whichever axes is currently active.
Fix: use ax.set_yscale(‘log‘) consistently once you have more than one axes.
Reproducible plotting in 2026: packaging, backends, and export habits
When I generate figures as part of a pipeline (benchmarks, model training reports, nightly analytics), I care less about the interactive window and more about consistent output.
A few habits that help:
- Pin dependencies. Use a
pyproject.tomlwith locked versions (uv/pip-tools/poetry all work) so Matplotlib behavior stays stable across machines. - Pick a non-interactive backend for CI. If you run plots in a headless environment, use the Agg backend explicitly.
- Export at known DPI and size. If your figure is used in docs and slides, decide on standards early.
Example: headless save with explicit settings.
import numpy as np
import matplotlib
matplotlib.use("Agg") # headless backend
import matplotlib.pyplot as plt
x = np.logspace(-3, 3, 800)
y = 1 / (1 + x)
fig, ax = plt.subplots(figsize=(8, 4), dpi=150)
ax.plot(x, y)
ax.set_xscale("log")
ax.set_title("Saved figure with log x-axis")
ax.set_xlabel("x (log)")
ax.set_ylabel("y")
ax.grid(True, which="both", alpha=0.6)
fig.tight_layout()
fig.savefig("logaxisexample.png", bbox_inches="tight")
If you plot millions of points, I also consider:
- Downsampling before plotting (keep the shape, drop redundant points)
- Rasterizing dense artists when exporting vector formats (PDF/SVG)
- Choosing scatter transparency carefully so density is visible without turning into a blob
These are not about fancy tricks; they’re about making sure a log plot remains readable and consistent when it moves from your laptop to CI to a published artifact.
Key takeaways and what I’d do next
When your data spans orders of magnitude, a linear axis can hide the story you need to see. A log axis is often the simplest fix, but it’s still a design choice—not a decoration. I recommend starting with the smallest possible change (ax.set_yscale(‘log‘) or plt.yscale(‘log‘)) and then polishing from there only if the plot is meant to be shared.
If you only remember a few rules, make them these:
- Use log scaling when ratios matter more than differences.
- Never ignore zeros and negatives—mask them, switch to
symlog, or change the chart. - Treat ticks and grids as part of correctness. On a log axis, minor ticks are often necessary for accurate reading.
- In code you expect to maintain, prefer the object-oriented API (
fig, ax = plt.subplots()andax.set_*) so your intent stays obvious.
My practical next step when building a plotting helper is to add a small “scale toggle” parameter (linear vs log vs symlog) and a validation check that warns when non-positive values are present. That one guardrail prevents the most common broken plots I see in real projects. If you want, tell me what kind of data you’re plotting (latency, file sizes, frequencies, training loss, distributions) and I’ll suggest the exact axis scaling and tick formatting I’d ship.


