numpy.interp(): Practical Linear Interpolation in Python (Deep Guide)

Last month I was debugging a vibration sensor pipeline for a manufacturing line. The device logged at irregular intervals, while our model expected a clean 100 ms grid. I could have pulled in a large interpolation package, but I only needed straight lines between known points. In moments like this, I reach for numpy.interp, a small function with big impact. It draws a straight segment between each adjacent pair of samples and gives me values at any x coordinate I choose, like stretching a string between fence posts and reading the height at a specific spot.

You should understand its rules before you put it in production. I will show you how the parameters interact, how boundary handling works, how periodic data changes the math, and where performance starts to matter. I will also cover common mistakes I see in code reviews, plus the alternatives I pick when the data is not a simple 1-D line.

Why linear interpolation still matters in 2026

Modern pipelines are packed with AI-assisted steps, but the raw signals feeding those steps are still messy. I see this in IoT telemetry, finance ticks, manufacturing sensors, and even app analytics. Most of those signals are 1-D sequences over time or distance, and the first thing I need is a consistent grid. Linear interpolation is the most direct way to get there. It is predictable, easy to audit, and simple to communicate to non-specialists.

I treat linear interpolation as a baseline that must be correct before I do anything fancy. In practice, I often wrangle data in pandas or Polars and then use numpy.interp for the numeric step. The function is reliable and does not require a heavy dependency chain. That matters when I am working on a small service, a CI environment, or a quick notebook with strict runtime limits.

There is also a human factor. A straight-line assumption is a reasonable story when you do not have evidence for a curved relationship. If the data changes smoothly, a line between two known points is a sensible estimate. If the data is jagged or discontinuous, linear interpolation makes the discontinuity obvious instead of hiding it. That clarity is exactly what I want early in a pipeline.

Understanding the signature and parameters

The signature is short, but every argument carries a rule I need to respect: numpy.interp(x, xp, fp, left=None, right=None, period=None).

  • x: The x-coordinates where you want results. It can be a scalar or any array-like structure. The output shape matches this input.
  • xp: The x-coordinates of your known data points. It must be 1-D and increasing if period is not set.
  • fp: The y-coordinates for each xp point. It must be the same length as xp and can be float or complex.
  • left: The value returned when x < xp[0]. Default is fp[0].
  • right: The value returned when x > xp[-1]. Default is fp[-1].
  • period: A period for circular data. If you set it, xp is normalized with xp % period and sorted. left and right are ignored because the data is treated as wrapping.

That last point matters a lot. Without period, xp must be strictly increasing, and x outside the range gets clamped to left or right. With period, the function wraps the input space so the edge becomes seamless; you are effectively interpolating on a circle.

A clear mental model (my go-to)

I picture numpy.interp as two steps:

1) Find which segment each x falls into.

2) Use a straight line between the segment endpoints to compute the value.

For a segment between (xp[i], fp[i]) and (xp[i+1], fp[i+1]), the interpolated value is:

fp[i] + (fp[i+1] – fp[i]) * (x – xp[i]) / (xp[i+1] – xp[i])

That is just the equation of a line, and numpy.interp does this for every x you provide. The function is vectorized, so it is very fast when x is an array. I remind myself that the method is linear inside each interval and discontinuous in slope at each xp breakpoint. That is a feature, not a bug: it makes the source data’s structure visible.

A minimal working example

This is the fastest way I remind myself how numpy.interp behaves:

import numpy as np

xp = np.array([0.0, 1.0, 2.0, 4.0])

fp = np.array([0.0, 1.0, 1.5, 3.0])

x = np.array([-0.5, 0.0, 0.5, 1.5, 3.0, 5.0])

y = np.interp(x, xp, fp)

print(y)

With defaults, the value at x < 0 returns fp[0] (0.0) and x > 4 returns fp[-1] (3.0). Inside the range, it produces simple line segments. If I need different boundary behavior, I set left or right explicitly.

Practical scenario: regularizing irregular sensor data

Here is a realistic pattern I use in production: resampling irregular time stamps onto a fixed grid.

import numpy as np

# Irregular time stamps (seconds)

t_raw = np.array([0.0, 0.15, 0.43, 1.02, 1.87, 2.40, 2.58])

v_raw = np.array([1.2, 1.4, 0.9, 1.1, 0.7, 0.8, 1.0])

# Regular grid every 0.1 seconds

t_grid = np.arange(0.0, 2.6 + 1e-9, 0.1)

vgrid = np.interp(tgrid, traw, vraw)

This converts a ragged signal into a consistent grid that downstream models can consume. It does not create extra smoothness or hide discontinuities. When I later compute a moving average or feed a classifier, I have a stable time base.

Boundary handling: choosing left and right intentionally

The default clamping behavior can be either correct or misleading depending on your domain. I treat boundaries as a policy decision.

  • Hold value: set left=fp[0] and right=fp[-1] (default). Works well for sensors that stabilize near the edges.
  • Zero outside range: set left=0.0 and right=0.0. Useful for impulse-like signals.
  • Exclude outside range: mask them after interpolation with boolean filters.
  • Fill with NaN: set left=np.nan and right=np.nan so later stages can ignore them.

Example with NaN boundaries:

x = np.array([-1.0, 0.5, 1.5, 3.0])

xp = np.array([0.0, 1.0, 2.0])

fp = np.array([10.0, 20.0, 40.0])

y = np.interp(x, xp, fp, left=np.nan, right=np.nan)

That single change prevents accidental extrapolation. I use it in analytics pipelines where I would rather drop out-of-bounds values than silently clamp.

Periodic data: using period correctly

When data lives on a circle or wraps around, clamping is wrong. Think of angles (0° = 360°), times of day, or any measurement on a loop. The period parameter makes numpy.interp treat the x-axis as circular.

Key behaviors I keep in mind:

  • period must be a positive number.
  • xp is normalized with xp % period and then sorted.
  • left and right are ignored because boundaries wrap.

Example: interpolate wind direction (degrees). I want to interpolate across 350° to 10° without jumping the long way around.

import numpy as np

xp = np.array([350.0, 10.0, 90.0])

fp = np.array([2.0, 3.0, 5.0]) # some associated speed

x = np.array([355.0, 0.0, 5.0, 20.0])

y = np.interp(x, xp, fp, period=360.0)

Because period is set, xp is normalized and sorted internally, which gives a smooth transition across 360° -> 0°.

A warning I give teammates: only the xp axis is wrapped. If your data itself is circular (like angles in degrees), you should convert it to a linear representation first (for example, using sine and cosine) before interpolating. Otherwise you may get linear averages that are not physically meaningful.

Input validation: what breaks and why

numpy.interp is optimized for speed, and it assumes you give it good input. I keep a quick checklist to avoid surprising results.

1) xp must be 1-D and strictly increasing (unless period is set). Duplicate or unsorted values can yield incorrect segments.

2) xp and fp must have the same length.

3) x can be any shape, but it must be numeric.

4) If xp has NaNs, behavior is undefined. Clean them first.

Here is a safe pattern I use when input might be messy:

import numpy as np

def safe_interp(x, xp, fp, left=np.nan, right=np.nan):

xp = np.asarray(xp)

fp = np.asarray(fp)

x = np.asarray(x)

mask = ~(np.isnan(xp) | np.isnan(fp))

xp = xp[mask]

fp = fp[mask]

if xp.ndim != 1:

raise ValueError(‘xp must be 1-D‘)

if xp.size != fp.size:

raise ValueError(‘xp and fp must have same length‘)

if xp.size < 2:

raise ValueError(‘xp must have at least 2 points‘)

if np.any(np.diff(xp) <= 0):

# For safety I sort here, or I can raise depending on policy

idx = np.argsort(xp)

xp = xp[idx]

fp = fp[idx]

return np.interp(x, xp, fp, left=left, right=right)

I do not always sort in production because sorting can hide data mistakes. When the data should already be ordered, I prefer raising a clear error.

Handling NaNs and missing values

Missing values are common in telemetry. numpy.interp itself does not ignore NaNs in fp; it will propagate them into any segment that includes them. That can be good or bad depending on your goal.

Strategies I use:

  • Drop NaNs by masking both xp and fp (as in safe_interp).
  • Forward-fill or backward-fill before interpolation if it is a signal with known behavior.
  • Keep NaNs and then clean the result afterward to highlight unreliable regions.

Example: forward-fill first, then interpolate.

fp_filled = fp.copy()

mask = np.isnan(fp_filled)

if mask.any():

# simple forward fill

for i in range(1, fp_filled.size):

if mask[i]:

fpfilled[i] = fpfilled[i-1]

y = np.interp(x, xp, fp_filled, left=np.nan, right=np.nan)

I keep the decision explicit because missing data is often a signal itself.

Working with complex values

numpy.interp supports complex fp. That is handy in signal processing where you represent phase and amplitude together. The interpolation is applied independently to the real and imaginary parts.

import numpy as np

xp = np.array([0.0, 1.0, 2.0])

fp = np.array([1+0j, 0+1j, -1+0j])

x = np.array([0.5, 1.5])

y = np.interp(x, xp, fp)

The result is still complex. I often use this in frequency-domain pipelines where I want a consistent grid across frequencies. But I remind myself that linear interpolation of complex numbers is not the same as interpolating phase angles. If the data represents angles, I switch to a vector form (cos and sin) to avoid discontinuities.

Shape behavior: x can be any shape

numpy.interp preserves the shape of x. I use this to avoid reshaping later.

x = np.linspace(0, 1, 12).reshape(3, 4)

y = np.interp(x, xp, fp)

# y.shape is (3, 4)

This matters when I interpolate many signals at once, for example a matrix of time grids for multiple channels.

Performance considerations: when it starts to matter

For moderate sizes, numpy.interp is fast enough that I rarely think about it. But at scale, some patterns help:

  • Sort xp once, then reuse it for many x calls.
  • Prefer float64 arrays for numeric stability; but if memory is tight, float32 can be acceptable.
  • Batch x values in large arrays instead of calling in a loop.

My rule of thumb: If I am interpolating more than a few million points repeatedly inside a tight loop, I will benchmark. The function is already optimized in C, so the big gains come from reducing Python loops and redundant preprocessing.

Here is a common performance pattern (many grids, same base data):

xp = np.asarray(xp)

fp = np.asarray(fp)

if np.any(np.diff(xp) <= 0):

idx = np.argsort(xp)

xp = xp[idx]

fp = fp[idx]

grids = [np.linspace(0, 10, 5000) for _ in range(50)]

results = [np.interp(g, xp, fp) for g in grids]

If that list comprehension becomes a bottleneck, I combine grids or flatten and then reshape after interpolation to reduce overhead.

Common pitfalls I see in code reviews

These issues show up often enough that I have a standard checklist:

1) Unsorted xp: The most frequent cause of wrong results. Always verify np.all(np.diff(xp) > 0) unless you set period.

2) Duplicate xp: A zero-length interval leads to division by zero internally, causing nonsensical output. Deduplicate or aggregate.

3) Silent clamping: Default left and right can hide out-of-range issues. Use NaN or explicit bounds.

4) Mixing units: Interpolating on mismatched units (seconds vs milliseconds) results in subtle errors.

5) Interpolating angles: Linear interpolation on wrap-around data without period causes sharp jumps.

When I find these, I add a quick guard or a short unit test, because the function itself will not warn you.

When NOT to use numpy.interp

I rely on numpy.interp as a baseline, but there are times I choose a different method.

  • You need smooth derivatives: For gradients or physics simulations, linear segments can be too rough. I use cubic splines instead.
  • You need 2-D or 3-D interpolation: numpy.interp is strictly 1-D. For grids in higher dimensions, I use scipy.interpolate or structured grid methods.
  • You are extrapolating far outside the range: Linear extrapolation is risky. I prefer explicit models or domain-specific assumptions.
  • You need monotonic constraints: Linear interpolation preserves monotonicity between points, but if you want a shape-preserving spline, I use PCHIP.

The core idea: numpy.interp is for fast, transparent 1-D interpolation. If you need more structure, jump to specialized tools.

Alternatives I reach for (and why)

Here is the rough map I use in practice:

Problem

My default choice

Why —

— Simple 1-D, fast, no extra deps

numpy.interp

Minimal overhead Need smooth first derivative

scipy.interpolate.CubicSpline

Smoothness Need shape preservation

scipy.interpolate.PchipInterpolator

Avoid overshoot Multi-dimensional grid

scipy.interpolate.RegularGridInterpolator

Works for N-D grids Irregular scattered data

scipy.interpolate.griddata

Handles unstructured points Piecewise constant

np.searchsorted + indexing

Fast step function

The key is to match the interpolation method to the physics of the signal. Over-smoothing can be as harmful as under-smoothing.

Real-world case study: price ticks to bars

In finance, I often take irregular ticks and construct fixed-interval bars. I use numpy.interp as a quick baseline for price series that is not too noisy.

import numpy as np

tick_times = np.array([0.1, 0.3, 0.8, 1.7, 2.2, 2.9]) # seconds

tick_prices = np.array([100.0, 100.2, 100.5, 100.1, 99.9, 100.3])

bar_times = np.arange(0.0, 3.0, 0.5)

barprices = np.interp(bartimes, ticktimes, tickprices, left=np.nan, right=np.nan)

This gives me a continuous price series on a fixed grid. I still keep the original ticks for accuracy, but the interpolated series is great for fast feature computation. In production, I compare it against the last-known value method to ensure I do not introduce bias.

Case study: filling missing motion capture frames

Motion capture data often drops frames. If I only need simple reconstruction for visualization, numpy.interp is sufficient.

import numpy as np

t = np.array([0, 1, 2, 5, 6, 7])

x = np.array([0.0, 0.5, 1.0, 2.0, 2.4, 2.6])

t_full = np.arange(0, 8, 1)

xfull = np.interp(tfull, t, x)

For high-precision analysis I use spline smoothing, but for quick checks this is perfect. The straight-line gaps are obvious, which reminds me where the data actually dropped.

Comparison: traditional vs modern workflow

Even with modern tools, I still land on numpy.interp as the core primitive. The difference is how I integrate it.

Step

Traditional workflow

Modern workflow —

— Clean data

Manual loops

Vectorized masks and np.where Interpolate

Custom loops

numpy.interp Validate

Quick plot

Automated checks + plotting Deploy

Static script

CI pipeline + monitoring

The function itself did not change, but my surrounding workflow did. I now treat interpolation as a stable building block rather than a one-off hack.

Production considerations: monitoring and validation

When I deploy interpolation in a pipeline, I want to detect when it produces bad output. These are the checks I include:

  • Out-of-range rate: Percentage of x outside xp range; if it spikes, I log it.
  • Duplicate xp rate: If duplicates are high, I halt the pipeline or reduce to unique points.
  • NaN propagation: Count NaNs before and after interpolation to detect data loss.
  • Value drift: Compare interpolated series statistics (mean, variance) to raw series.

I also sample a few interpolated points and store them in logs. If a downstream model starts behaving badly, I can inspect whether the interpolation step drifted.

Debugging tips I use in practice

When results look off, I follow a short sequence:

1) Plot xp vs fp and overlay x vs interpolated y.

2) Check for unsorted or duplicate xp.

3) Inspect boundary behavior by testing a few x values outside range.

4) Verify units and scaling.

A tiny plot often reveals the issue in seconds. When I cannot plot (headless environment), I print small slices around the problematic region and compute the expected line manually to verify.

Frequently asked questions I hear

Does numpy.interp extrapolate?

It does not extrapolate by default. It clamps to left or right values. If you want extrapolation, you must implement it yourself or use other tools.

Can I pass lists?

Yes. It accepts array-like inputs and converts them to arrays.

Is it stable for large values?

Numerically it is stable for most practical ranges, but extreme values can lead to floating precision issues. Normalize if needed.

Is it thread-safe?

Yes, it is a pure function on input arrays.

Does it support 2-D interpolation?

No. Use a multi-dimensional interpolator from a scientific library.

A quick checklist before using it in production

This is my pre-flight list before I sign off on interpolation code:

  • xp is 1-D, strictly increasing (or period is set).
  • xp and fp lengths match.
  • Boundary policy is explicit (left, right, or NaN).
  • Missing data handling is documented.
  • Unit tests cover typical and edge cases.
  • Performance is acceptable at scale.

If I can check all of these, I am confident the interpolation step will behave predictably.

Closing thoughts

numpy.interp is one of those quiet functions that keeps pipelines moving. It is simple, fast, and honest about what it does. I keep it in my toolbox because it removes complexity and lets me focus on modeling or analysis instead of reinventing a line-drawing algorithm.

The core idea is still a straight line between points. The hard part is making sure that line makes sense for your data, your boundaries, and your workflow. When you use numpy.interp thoughtfully, it becomes a trustworthy building block for everything that comes after it.

If you want, I can also expand this into a companion section on plotting interpolation quality, add a troubleshooting table, or generate a set of unit tests around the examples.

Scroll to Top