I keep seeing the same bug show up in code reviews: a model or dashboard looks "fine" on one dataset and completely off on the next. The root cause is almost always inconsistent scaling. One week you‘re feeding raw sensor volts, the next you‘re feeding converted milliamps, and your features have very different ranges. When that happens, distance-based methods drift, plots compress, and even simple threshold logic flips. Normalization is the boring fix that keeps everything stable. I use it in ETL scripts, analytics notebooks, and production scoring services because it makes the data comparable and predictable.
In this post I‘m going to show you how I normalize arrays with NumPy in 2026-era Python. You‘ll see the classic min-max approach, per-feature scaling for 2D arrays, z-scores, and L1/L2 normalization for vectors. I‘ll also point out edge cases that bite in real systems: constant arrays, NaNs, integer overflow, and streaming updates with precomputed stats. By the end you should be able to pick the right approach, implement it in a few lines, and know exactly what it does to your data.
What normalization really means in NumPy
Normalization is just a rule that maps numbers from one scale to another. The most common goal is to map values to the range [0, 1]. If your values are 1, 2, 4, 8, and 10, min-max scaling turns them into 0.0, 0.125, 0.375, 0.875, and 1.0. Nothing magical happens; it‘s just linear scaling.
In practice I use three families of normalization, each for a different question:
- Min-max scaling: preserves relative distances and makes everything fit a fixed range. Good for plotting, image preprocessing, and any algorithm that assumes bounded input.
- Z-score standardization: shifts values to mean 0 and scales to unit variance. Good for models that expect centered features or when you compare features with different units.
- Vector normalization (L1/L2): scales each row or vector so its length is 1. Good for similarity searches and embeddings.
Here are the formulas I keep in mind:
- Min-max:
x_norm = (x - min) / (max - min) - Z-score:
x_norm = (x - mean) / std - L2 norm:
xnorm = x / |x 2
All three can be done with plain NumPy in a vectorized way, which is both readable and fast. You should almost never loop in Python for these operations.
Min-max normalization with pure NumPy
I start with the simplest case: a 1D array. Convert to float, compute min and max once, and scale. If you keep the array as integers, you‘ll get integer division in older code or unintended casting. I always cast to float to make the intent clear.
import numpy as np
prices = np.array([1, 2, 4, 8, 10], dtype=float)
Min-max scaling to [0, 1]
scaled = (prices - prices.min()) / (prices.max() - prices.min())
print(scaled.tolist())
Output:
[0.0, 0.125, 0.375, 0.875, 1.0]
If you want a different target range, say [a, b], stretch and shift:
import numpy as np
a = np.array([3, 6, 9, 12], dtype=float)
low, high = -1.0, 1.0
scaled = (a - a.min()) / (a.max() - a.min())
scaled = scaled * (high - low) + low
print(scaled.tolist())
Output:
[-1.0, -0.33333333333333337, 0.33333333333333326, 1.0]
In 2D, you need to decide whether you normalize the whole array as one distribution or per column. If you normalize the whole array, flatten, scale, and reshape:
import numpy as np
matrix = np.array([[1, 2], [3, 6], [8, 10]], dtype=float)
flat = matrix.ravel()
scaled = (flat - flat.min()) / (flat.max() - flat.min())
scaled = scaled.reshape(matrix.shape)
print(scaled.tolist())
Output:
[[0.0, 0.1111111111111111], [0.2222222222222222, 0.5555555555555556], [0.7777777777777778, 1.0]]
That treats the whole matrix as one distribution. If you want per-feature scaling, you should work along an axis, which I‘ll show next.
Per-feature scaling with axis and keepdims
When I normalize features for machine learning, I almost always scale each column independently. A column is a feature, a row is a sample. That means each column gets its own min and max. NumPy makes this clean with axis=0 and keepdims=True.
import numpy as np
samples = np.array([
[120.0, 0.3, 2000.0],
[150.0, 0.6, 1200.0],
[180.0, 0.9, 800.0],
])
col_min = samples.min(axis=0, keepdims=True)
col_max = samples.max(axis=0, keepdims=True)
scaled = (samples - colmin) / (colmax - col_min)
print(scaled.tolist())
Output:
[[0.0, 0.0, 1.0], [0.5, 0.5, 0.25], [1.0, 1.0, 0.0]]
Notice what happened: the third column has values that go down as the row index increases, and the normalization preserves that order. The shape of colmin and colmax is (1, 3) because of keepdims=True, which means the subtraction broadcasts cleanly.
If you want to normalize each row instead (for example, when each row is a document vector), flip to axis=1 and keep dimensions:
import numpy as np
vectors = np.array([
[2.0, 3.0, 5.0],
[10.0, 0.0, 5.0],
])
row_min = vectors.min(axis=1, keepdims=True)
row_max = vectors.max(axis=1, keepdims=True)
scaled = (vectors - rowmin) / (rowmax - row_min)
print(scaled.tolist())
Output:
[[0.0, 0.3333333333333333, 1.0], [1.0, 0.0, 0.5]]
This axis choice is a key decision. If you pick the wrong axis, you normalize the wrong thing. In my experience, most bugs come from mixing up "per feature" and "per sample." I always annotate it in code comments when it‘s not obvious.
Z-score normalization for centered data
Min-max scaling keeps your original distribution shape, but it does not center it. Some algorithms behave better when data is centered around zero with a standard deviation of one, especially linear models and gradient-based training. Z-scores do that.
import numpy as np
scores = np.array([55, 70, 80, 90, 100], dtype=float)
mean = scores.mean()
std = scores.std() # population std; use ddof=1 for sample
z = (scores - mean) / std
print(z.round(3).tolist())
Output:
[-1.414, -0.468, 0.188, 0.844, 0.85]
A few practical notes from real projects:
- If you‘re scaling training and test sets, always compute
meanandstdon training data, then apply to test data. That keeps your evaluation honest. stdcan be zero if the feature is constant. Guard against division by zero (I show a safe pattern later).- Use
ddof=1when you treat the data as a sample, not the whole population. The difference is small but meaningful when samples are tiny.
For a 2D array with per-feature z-scores, it looks like this:
import numpy as np
features = np.array([
[1.0, 10.0, 100.0],
[2.0, 20.0, 100.0],
[3.0, 30.0, 100.0],
])
mean = features.mean(axis=0, keepdims=True)
std = features.std(axis=0, keepdims=True)
z = (features - mean) / std
print(z.tolist())
If a column is constant, the standard deviation is zero. That produces inf or nan. The safe approach is to replace zeros with 1.0 before dividing, which keeps constant columns at 0 after centering.
L1 and L2 normalization for vectors
When you normalize by vector length, you are not trying to bound values between 0 and 1. Instead, you want each vector to have length 1 so that dot products behave like cosine similarity. I use this for embeddings, text features, and any similarity search where magnitude should not dominate.
L2 normalization scales each vector by its Euclidean norm:
import numpy as np
emb = np.array([
[3.0, 4.0],
[1.0, 0.0],
])
norms = np.linalg.norm(emb, axis=1, keepdims=True)
Avoid division by zero for zero vectors
norms = np.where(norms == 0, 1.0, norms)
l2 = emb / norms
print(l2.tolist())
Output:
[[0.6, 0.8], [1.0, 0.0]]
L1 normalization uses the sum of absolute values instead of the Euclidean norm:
import numpy as np
data = np.array([
[1.0, -1.0, 2.0],
[0.0, 0.0, 0.0],
])
l1 = np.sum(np.abs(data), axis=1, keepdims=True)
Keep zero vectors unchanged
l1 = np.where(l1 == 0, 1.0, l1)
l1_norm = data / l1
print(l1_norm.tolist())
Output:
[[0.25, -0.25, 0.5], [0.0, 0.0, 0.0]]
If you‘re comparing vectors, L2 normalization is the default choice. L1 is more common when you want sparse outputs or interpretability. I pick L2 unless I have a strong reason not to.
Precomputed stats for streaming and consistent scaling
In production, you rarely normalize in isolation. You usually normalize test data based on training stats, or you apply the same scaling across different batches in a pipeline. That‘s where precomputed stats matter.
Here‘s the pattern I use. The key is to separate the "fit" step from the "apply" step. Compute stats once, store them, and apply them consistently.
import numpy as np
Fit step (training data)
train = np.array([10, 20, 40, 80, 100], dtype=float)
train_min = train.min()
train_max = train.max()
Apply step (new data)
new = np.array([5, 15, 50, 120], dtype=float)
scaled = (new - trainmin) / (trainmax - train_min)
print(scaled.tolist())
You can do the same for z-scores with trainmean and trainstd. In real systems, those stats often live in a model artifact, a JSON file, or a metadata table in your feature store.
A simple analogy I use with teams: think of normalization as choosing a ruler. Once you pick a ruler for the training data, you can‘t swap it later and expect your model to understand the units. The numbers might look "reasonable" but they mean different things.
Common mistakes and how I avoid them
Normalization looks easy, but there are a few traps that show up all the time. Here are the ones I see most often, plus the fix I use.
- Integer arrays silently truncate results: If you subtract and divide integers, you may get integer output in older code or unexpected casting. I always create arrays with
dtype=floator callastype(float)at the start. - Division by zero on constant features: A constant column makes
(max - min)orstdequal to zero. I replace zeros with 1.0 before dividing. That keeps the result stable and preserves the constant feature as zero after centering. - Wrong axis: I see
axis=1used whereaxis=0was intended, especially when the shape flips. I add a short comment like# scale per featureto make it clear. - Normalizing the test set separately: This contaminates evaluation because you sneak information from the test distribution into your scaling. Always compute stats from training data only.
- NaNs and infs: If your data includes NaN or inf values,
minandmaxwill propagate them. Usenp.nanmin,np.nanmax, andnp.nanmeanif you expect missing values.
Here‘s a robust min-max function I use when data is messy:
import numpy as np
def minmax_scale(x: np.ndarray, axis=None) -> np.ndarray:
x = np.asarray(x, dtype=float)
min_val = np.nanmin(x, axis=axis, keepdims=True)
max_val = np.nanmax(x, axis=axis, keepdims=True)
denom = maxval - minval
# Avoid division by zero
denom = np.where(denom == 0, 1.0, denom)
return (x - min_val) / denom
values = np.array([[1, 2, np.nan], [1, 2, 3]], dtype=float)
print(minmax_scale(values, axis=0).tolist())
That function is easy to drop into a pipeline and use for both 1D and 2D arrays. It also makes the axis choice explicit.
When to use each method – and when not to
You should pick the normalization method that matches your goal. Here‘s how I make the call:
- Use min-max scaling when you need bounded values, especially for plots, images, or algorithms that expect inputs in a fixed range. If your data has heavy outliers, min-max can squeeze most values into a tiny band. In that case, I consider clipping or a robust scaler.
- Use z-scores when you need centered data or are feeding a model that expects roughly standard-normal features. It‘s a solid default for linear models and neural networks that behave better with centered inputs.
- Use L2 normalization when you care about direction, not magnitude. Embeddings and similarity searches are classic cases.
When should you avoid normalization? If your algorithm is scale-invariant already, normalization can add noise. Decision trees and random forests, for example, don‘t need normalized inputs. I still normalize when I‘m mixing models or comparing feature importances across different pipelines, but I‘m explicit about why.
Performance notes from real workloads
NumPy is fast because it pushes work into highly tuned C loops. The main performance cost is memory traffic, not CPU. The difference between a loop and a vectorized expression is night and day.
Here are practical guidelines I follow:
- Avoid Python loops for large arrays. A vectorized min-max on a 1 million element array typically finishes in the 10-20 ms range on a modern laptop, while a Python loop can be 50-100x slower.
- Reuse computed stats instead of recalculating min and max each time. That saves another pass through memory and makes results consistent.
- Be careful with copies. Expressions like
(x - min) / (max - min)create temporaries. If you‘re memory-bound, consider usingoutparameters or in-place operations with care. - Use
np.asarrayto avoid unnecessary copying when you don‘t need ownership changes.
If you need even more speed and are already using 2026 tooling, you can offload normalization to GPU with CuPy or JAX for large batches. I only do that when I already have the data on the GPU; moving data back and forth can be slower than the CPU path for small arrays.
Traditional vs modern normalization workflows
I see two main styles in teams today. The "traditional" approach is script-first: normalize inside a notebook or script and move on. The "modern" approach treats normalization as a reusable step with stored stats, tracked metadata, and automated tests.
Here‘s a quick comparison I use in reviews:
Script-only min-max
Feature store pipeline
—
—
Low
High
Manual
Excellent
5-10 minutes
1-3 hours
None
Automated alerts
One-off analysis
Production MLThe key point: normalization is not just a math step; it’s a contract. If your values are scaled in training, they must be scaled the same way in production. The more serious the system, the more you want that contract encoded in code and metadata rather than in someone’s memory.
A deeper look at broadcasting (why it matters)
Most normalization bugs I debug end up being broadcasting misunderstandings. Broadcasting is the mechanism NumPy uses to align arrays of different shapes during arithmetic. If you don’t keep track of shapes, your code might run without errors but normalize the wrong axis.
Here’s a quick mental model I use:
- Think of your data as
rows = samplesandcolumns = features. - If you want per-feature stats, your min/max should have shape
(1, num_features). - If you want per-sample stats, your min/max should have shape
(num_samples, 1).
In code, keepdims=True preserves the dimension so subtraction and division line up correctly.
import numpy as np
x = np.array([
[1.0, 10.0, 100.0],
[2.0, 20.0, 200.0],
])
Per-feature min/max
col_min = x.min(axis=0, keepdims=True) # shape (1, 3)
col_max = x.max(axis=0, keepdims=True)
Per-sample min/max
row_min = x.min(axis=1, keepdims=True) # shape (2, 1)
row_max = x.max(axis=1, keepdims=True)
print(colmin.shape, rowmin.shape)
When I’m unsure, I print shapes before I scale. It saves me from silent errors.
Handling constant arrays and zero variance features
Constant arrays are common in real data: a device stuck at a default value, a missing sensor, or a feature that’s not populated yet. They’re also a normalization trap.
- For min-max,
max - minbecomes zero. - For z-score,
stdbecomes zero. - For L1/L2, the norm becomes zero if the vector is all zeros.
The rule I follow is simple: if the denominator is zero, replace it with 1. This keeps the output stable and prevents inf or NaN. You still get meaningful results: a constant column becomes all zeros after normalization.
Here’s a safe z-score helper that follows that rule:
import numpy as np
def zscore(x: np.ndarray, axis=None) -> np.ndarray:
x = np.asarray(x, dtype=float)
mean = np.nanmean(x, axis=axis, keepdims=True)
std = np.nanstd(x, axis=axis, keepdims=True)
std = np.where(std == 0, 1.0, std)
return (x - mean) / std
values = np.array([[5, 5, 5], [5, 5, 5]], dtype=float)
print(zscore(values, axis=0).tolist())
This pattern is boring but dependable. It shows up in every production pipeline I’ve shipped.
Normalizing with missing values (NaN-safe patterns)
In real data, NaNs happen: gaps in sensor feeds, optional fields, failed parsing. Standard NumPy reductions propagate NaNs, which can wipe out your stats. That’s why NumPy provides np.nanmin, np.nanmax, np.nanmean, and np.nanstd.
I use them whenever missing values are possible. The trick is to combine them with safe denominator handling.
import numpy as np
def minmaxnansafe(x: np.ndarray, axis=None) -> np.ndarray:
x = np.asarray(x, dtype=float)
min_val = np.nanmin(x, axis=axis, keepdims=True)
max_val = np.nanmax(x, axis=axis, keepdims=True)
denom = maxval - minval
denom = np.where(denom == 0, 1.0, denom)
return (x - min_val) / denom
x = np.array([1, 2, np.nan, 4], dtype=float)
print(minmaxnansafe(x).tolist())
Be clear about how you want NaNs to behave. In most of my pipelines, I keep NaNs as NaNs after normalization so downstream steps can handle them consistently. If you want to impute missing values instead, do that before you normalize.
Integer overflow and dtype gotchas
NumPy tries to be efficient, but that sometimes surprises people. If you do arithmetic on small integer dtypes (like int8 or uint8), it can overflow or wrap around before you cast to float. This is especially common with image data and sensor bytes.
Example: if you subtract 200 from a uint8 array, the result is not negative; it wraps around. The fix is to cast to float early.
import numpy as np
img = np.array([0, 128, 255], dtype=np.uint8)
Wrong: subtraction happens in uint8
wrong = (img - img.min()) / (img.max() - img.min())
Right: cast to float before arithmetic
img_f = img.astype(float)
right = (imgf - imgf.min()) / (imgf.max() - imgf.min())
print(wrong.tolist())
print(right.tolist())
If you work with images or byte-level data, I consider this one of the most important normalization safeguards.
Clipping and robust scaling when outliers dominate
Min-max scaling is sensitive to extreme outliers. If one value is huge, the rest of your data collapses into a tight band near zero. That might be accurate, but it can be unhelpful for visualization or model training.
Two practical strategies:
- Clip values to a percentile range before scaling.
- Use a robust scale like median and interquartile range (IQR).
Here’s a clipping pattern I use when I need stable min-max but can tolerate outlier caps:
import numpy as np
def minmaxwithclip(x: np.ndarray, lowq=1.0, highq=99.0) -> np.ndarray:
x = np.asarray(x, dtype=float)
lo = np.percentile(x, low_q)
hi = np.percentile(x, high_q)
clipped = np.clip(x, lo, hi)
return (clipped - lo) / (hi - lo)
x = np.array([1, 2, 3, 4, 1000], dtype=float)
print(minmaxwithclip(x).round(3).tolist())
I keep this in mind for metrics like revenue or latency where the tail is heavy. It’s not always right, but it’s a useful knob.
Normalizing images (special case with known ranges)
Images are a common special case because they often have a known range: 0-255 for uint8 or 0-1 for floats. In that case, you don’t need min-max at all—you already know the min and max. The simplest normalization is just divide by 255.
import numpy as np
img = np.array([[0, 128, 255]], dtype=np.uint8)
img_f = img.astype(float) / 255.0
print(img_f.tolist())
If you’re working with preprocessed images that are already float, check the range first. I’ve seen pipelines normalize twice, which compresses values into a tiny range and ruins contrast.
Normalizing in place to reduce memory use
For very large arrays, creating multiple temporaries can blow up memory. NumPy lets you normalize in place with care, which can save a lot of RAM. The tradeoff is that your original data gets modified, so only do this if you intend to overwrite it.
Here’s an in-place min-max scaling example:
import numpy as np
x = np.array([10.0, 20.0, 40.0, 80.0])
min_val = x.min()
max_val = x.max()
Subtract in place
x -= min_val
Divide in place
denom = maxval - minval
if denom == 0:
denom = 1.0
x /= denom
print(x.tolist())
In-place operations are worth considering when your arrays are large and memory-bound, but I only use them when I’m sure I don’t need the original array.
Normalizing rows or columns with np.applyalongaxis (and why I avoid it)
You’ll sometimes see np.applyalongaxis used for normalization. It works, but it’s slower and more complex than pure vectorization. I almost always prefer broadcasting instead.
Here’s what I avoid:
import numpy as np
x = np.array([[1.0, 2.0], [3.0, 4.0]])
def minmax(v):
return (v - v.min()) / (v.max() - v.min())
scaled = np.applyalongaxis(minmax, axis=0, arr=x)
This calls Python for every slice. For small arrays it’s fine, but for large arrays it’s a performance drag. Broadcasting with axis and keepdims is cleaner and faster.
Making normalization reusable (a tiny class pattern)
Sometimes it’s useful to package normalization into a small object so you can fit once and apply many times. I often do this when I don’t want to pull in a larger ML library but still need consistent scaling.
import numpy as np
class MinMaxScaler:
def init(self):
self.min_ = None
self.max_ = None
def fit(self, x: np.ndarray, axis=0):
x = np.asarray(x, dtype=float)
self.min_ = x.min(axis=axis, keepdims=True)
self.max_ = x.max(axis=axis, keepdims=True)
return self
def transform(self, x: np.ndarray):
x = np.asarray(x, dtype=float)
denom = self.max - self.min
denom = np.where(denom == 0, 1.0, denom)
return (x - self.min_) / denom
def fit_transform(self, x: np.ndarray, axis=0):
return self.fit(x, axis=axis).transform(x)
scaler = MinMaxScaler()
train = np.array([[1, 10], [2, 20], [3, 30]], dtype=float)
print(scaler.fit_transform(train).tolist())
new = np.array([[4, 15], [5, 25]], dtype=float)
print(scaler.transform(new).tolist())
This pattern is tiny but it prevents the biggest normalization bug: changing statistics between training and inference.
Streaming and incremental normalization (approximate strategies)
When data arrives in streams, you can’t always compute global min and max in one shot. There are a few options:
- Batch windowing: compute stats per window (hour/day) and accept some drift.
- Running min/max: update min and max as new data arrives. This is simple but can be skewed by early outliers.
- Running mean and variance: for z-scores, use a numerically stable online update (like Welford’s algorithm).
Here’s a simple running mean/variance snippet that updates as data streams in. It’s not a full normalization pipeline, but it shows the core idea:
import numpy as np
count = 0
mean = 0.0
M2 = 0.0
for x in [10, 20, 30, 40, 50]:
count += 1
delta = x - mean
mean += delta / count
delta2 = x - mean
M2 += delta * delta2
variance = M2 / count
std = np.sqrt(variance)
print(mean, std)
If you do streaming normalization, be explicit about its limitations. Global min-max computed over a stream is only correct if you keep the true min and max; if you forget early data, the scale shifts.
Normalizing with mixed data types (numeric + categorical)
Normalization is only for numeric features. A common pitfall is trying to normalize everything after one-hot encoding, which can be redundant or even counterproductive. If you one-hot encode categories, the features are already in a comparable range (0 or 1). Min-max scaling won’t change them, but z-scoring can introduce negative values that might be confusing.
My rule:
- Normalize continuous numeric features.
- Leave binary or one-hot features as-is.
- Be careful with count-based features; min-max is okay, z-score can be better if counts are heavy-tailed.
If you keep these categories separate, your pipeline stays easier to reason about.
How to pick a normalization method in practice
Here’s the quick decision tree I use when I’m deciding between methods:
- Do I need a fixed range? If yes, min-max to [0, 1] or another range.
- Do I need centered features? If yes, z-score.
- Am I comparing vectors by angle? If yes, L2 normalization.
- Are outliers extreme? Consider clipping or robust scaling.
- Is the algorithm scale-invariant? If yes, you might skip normalization.
This isn’t perfect, but it works well enough that I rarely have to revisit the choice later.
Practical scenarios (what I actually do)
Here are real-world scenarios and how I normalize:
- Time-series sensor data: z-score per sensor column after cleaning NaNs. That keeps drift visible.
- Image preprocessing: divide by 255 or scale to [-1, 1] if the model expects it.
- Customer metrics for clustering: min-max per feature to keep ranges comparable.
- Embeddings for similarity search: L2 normalize per vector to use cosine similarity.
- Financial metrics with outliers: clip to percentiles, then min-max or z-score.
The method is less important than consistency. Once you choose, keep it stable across environments.
Debugging normalization (simple checks I run)
When a model or plot looks off, I use quick sanity checks:
- Print the min and max of the normalized array; they should match the target range.
- Print the mean and std for z-scores; they should be near 0 and 1.
- Check a constant column; it should become zeros, not NaNs.
- Check shape alignment for broadcasting; print shapes before and after scaling.
Here’s a tiny diagnostic helper I sometimes drop into notebooks:
import numpy as np
def describe(x: np.ndarray, name="x"):
x = np.asarray(x, dtype=float)
print(f"{name}: shape={x.shape}, min={np.nanmin(x):.4f}, max={np.nanmax(x):.4f}, mean={np.nanmean(x):.4f}, std={np.nanstd(x):.4f}")
x = np.array([1, 2, 3, 4, 5], dtype=float)
describe(x, "raw")
scaled = (x - x.min()) / (x.max() - x.min())
describe(scaled, "minmax")
It’s not fancy, but it catches obvious mistakes fast.
A note on reproducibility and metadata
If your normalization stats are computed once and then reused, you should store them alongside your model or dataset. I include:
- the method (min-max, z-score, L2)
- the axis (per feature, per sample)
- the stats (min, max, mean, std)
- the dtype and version of the pipeline
This is the difference between a reproducible pipeline and a fragile one. When debugging a production issue, having those stats saved is gold.
Alternative approaches (and why I still use NumPy)
There are lots of libraries that offer normalization—machine learning libraries, data pipelines, even database functions. They’re fine. But I still use NumPy in three cases:
- I want a lightweight dependency for a data script.
- I need a custom tweak (clipping, NaN handling, axis tricks).
- I want to understand exactly what the math is doing.
NumPy’s directness is a feature. You see the math, you see the shapes, and you can reason about it without extra abstraction.
Putting it all together: a practical normalization toolkit
When I’m working on a real project, I usually end up with a small set of helper functions that cover 90% of needs. Here’s a compact, practical toolkit you can paste into a notebook or module:
import numpy as np
def minmax_scale(x: np.ndarray, axis=None, clip=None) -> np.ndarray:
x = np.asarray(x, dtype=float)
if clip is not None:
lo, hi = clip
x = np.clip(x, lo, hi)
min_val = np.nanmin(x, axis=axis, keepdims=True)
max_val = np.nanmax(x, axis=axis, keepdims=True)
denom = maxval - minval
denom = np.where(denom == 0, 1.0, denom)
return (x - min_val) / denom
def zscore_scale(x: np.ndarray, axis=None) -> np.ndarray:
x = np.asarray(x, dtype=float)
mean = np.nanmean(x, axis=axis, keepdims=True)
std = np.nanstd(x, axis=axis, keepdims=True)
std = np.where(std == 0, 1.0, std)
return (x - mean) / std
def l2_normalize(x: np.ndarray, axis=1) -> np.ndarray:
x = np.asarray(x, dtype=float)
norms = np.linalg.norm(x, axis=axis, keepdims=True)
norms = np.where(norms == 0, 1.0, norms)
return x / norms
Example usage
x = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
print(minmax_scale(x, axis=0).tolist())
print(zscore_scale(x, axis=0).round(3).tolist())
print(l2_normalize(x, axis=1).round(3).tolist())
This set covers most normalization needs without additional dependencies. You can wrap it in a class later if you need persistence.
Final takeaway
Normalization is simple, but it’s also one of the easiest places to introduce subtle bugs. The math is straightforward, but the choices around axis, dtype, missing values, and consistency across datasets are what matter in practice. If you pick a method intentionally, keep your stats stable, and guard against the classic edge cases, you’ll avoid most surprises.
If I had to reduce this to three rules, they would be:
- Choose the normalization method based on what you need (range, centering, or vector length).
- Compute stats once and reuse them across training and inference.
- Always guard against zeros, NaNs, and dtype pitfalls.
Do that, and your arrays will behave predictably—even when your data changes beneath you.


