Partial Derivatives Practice Problems: A Hands‑On, Engineer‑Friendly Guide

I keep meeting early‑career engineers who can wire up a machine learning pipeline yet freeze when a loss function involves two variables. The snag isn’t calculus itself—it’s confidence with partial derivatives under mild pressure. In this piece I’ll share how I practice and teach the topic today: start from the gradient intuition, grind through carefully chosen exercises, check answers with lightweight tooling, and finish with real scenarios like surface normals and regularized losses. If you stay with me for the next few minutes, you’ll leave with a repeatable set of practice problems, worked solutions, and habits that make multivariable differentiation feel as routine as writing a unit test.

Quick Refresher: What a Partial Derivative Tells Me

When a function depends on several variables, a partial derivative measures the instantaneous rate of change with respect to one variable while freezing the others. I picture a drone moving on a terrain height map z = f(x, y): ∂f/∂x is the slope felt when nudging purely east–west; ∂f/∂y is the slope north–south. Formally, ∂f/∂x = lim_{h→0} (f(x+h, y) − f(x, y)) / h, holding y constant. The collection of first‑order partials forms the gradient vector, and its direction points toward steepest ascent. Higher‑order partials (like ∂²f/∂x∂y) tell me how that slope itself is changing, which matters for curvature and optimization.

What I emphasize to students is that “freeze the other variables” is not a vague instruction; it’s a concrete operational rule. Treat the non‑target variables like constants. If you get the constant rule in one‑variable calculus, you already have 80% of partial derivatives down.

Notation and Small Rules Worth Memorizing

∂f/∂x, fx, and Dx f all describe the same operation; pick one style and stay consistent in your notes.
Mixed partials: if f is smooth enough, ∂²f/∂x∂y = ∂²f/∂y∂x (Clairaut’s theorem). I still check both when learning.
Product rule: ∂/∂x (u·v) = ux v + u vx with other variables frozen.
Chain rule for compositions: if f(x, y) = g(h(x, y)), then fx = g‘(h(x, y)) · hx.
Exponentials with variable exponents: treat the exponent carefully; for f = e^{x y}, fx = y e^{x y} and fy = x e^{x y}.

Two extra rules I always add to the cheat sheet:

Quotient rule: ∂/∂x (u/v) = (ux v − u vx)/v^2, again freezing the other variables.
Implicit differentiation for constraints: if F(x, y) = 0 defines y as a function of x, then dy/dx = −Fx/Fy, as long as F_y ≠ 0. This shows up in constraint optimization and level‑set problems.

Practice Set 1: Straightforward First‑Order Partials

These warm‑ups build muscle memory on polynomials, logs, and exponentials.

Problem A: f(x, y) = 3x^2 y + 4 y^3. Compute fx and fy.

f_x = 6 x y
f_y = 3 x^2 + 12 y^2

Problem B: f(x, y) = ln(x y). Compute fx and fy.

f_x = 1/x
f_y = 1/y

Problem C: f(x, y) = x e^{x y}. Compute fx and fy.

f_x = e^{x y} + x y e^{x y} = e^{x y} (1 + x y)
f_y = x^2 e^{x y}

Problem D: f(x, y) = sin(x + y). Compute fx and fy.

f_x = cos(x + y)
f_y = cos(x + y)

Why these? They cover three recurring patterns: coefficient juggling, log simplification, and exponential chain rules. I keep them on an index card and solve them cold once a week.

If you want an extra‑simple check: swap x and y. The algebra should mirror cleanly. If the original function is symmetric in x and y, the partial derivatives should reflect that symmetry.

Practice Set 2: Second‑Order and Mixed Partials

Curvature matters whenever you care about minima, maxima, or saddle behavior. Let’s extend the earlier set.

Problem E: f(x, y) = x^2 e^y. Find f{xx}, f{yy}, f_{xy}.

f_x = 2 x e^y
f_y = x^2 e^y
f_{xx} = 2 e^y
f_{yy} = x^2 e^y
f_{xy} = ∂/∂y (2 x e^y) = 2 x e^y

Problem F: f(x, y) = x^3 + y^3 − 3 x y. Find all first and second partials.

f_x = 3 x^2 − 3 y
f_y = 3 y^2 − 3 x
f_{xx} = 6 x
f_{yy} = 6 y
f_{xy} = −3
f_{yx} = −3 (matches, as expected)

Problem G: f(x, y) = x^2 + x y + y^2. Find fx, fy, f{xx}, f{yy}, f_{xy}.

f_x = 2 x + y
f_y = x + 2 y
f_{xx} = 2
f_{yy} = 2
f_{xy} = 1

Practical tip: when the function is quadratic, second partials are constants; that’s a quick sanity check. If you get a variable in f_{xx} for a quadratic, something went wrong.

Practice Set 3: Tangent Planes and Linearization

A partial derivative is the slope of the tangent plane in its axis direction. Turning partials into the plane equation cements geometric meaning.

Problem H: Surface z = x^2 + y^2 at point (1, 1, 2). Find the tangent plane.

f_x = 2 x → at (1,1): 2
f_y = 2 y → at (1,1): 2
Plane: z − 2 = 2 (x − 1) + 2 (y − 1) ⇒ z = 2 x + 2 y − 2

Problem I: Surface z = e^{x y} at point (0, 0, 1). Find the tangent plane.

f_x = y e^{x y} → 0
f_y = x e^{x y} → 0
Plane: z − 1 = 0·(x − 0) + 0·(y − 0) ⇒ z = 1 (a horizontal plane)

Problem J: Surface z = ln(x + y) at point (1, 0, 0). Find the tangent plane.

f_x = 1/(x + y) → 1
f_y = 1/(x + y) → 1
Plane: z − 0 = 1·(x − 1) + 1·(y − 0) ⇒ z = x + y − 1

When I teach this, I ask students to draw contour lines and the plane together. Visualizing how the plane kisses the surface removes the abstraction from the derivative symbol.

A more advanced linearization habit: I always rewrite the tangent plane as

z ≈ f(x0, y0) + fx(x0, y0)(x − x0) + fy(x0, y0)(y − y0).

This form doubles as a quick approximation formula, which is useful when you’re doing back‑of‑the‑envelope estimates.

Practice Set 4: Hessians and Critical Points

First partials locate stationary points; second partials classify them. Here’s a small workflow.

1) Solve fx = 0 and fy = 0 for candidate points.

2) Form the Hessian matrix H = [[f{xx}, f{xy}], [f{yx}, f{yy}]].

3) Compute determinant D = f{xx} f{yy} − (f_{xy})^2.

4) If D > 0 and f{xx} > 0, you have a local minimum; if D > 0 and f{xx} < 0, a local maximum; if D < 0, a saddle; if D = 0, inconclusive.

Problem K: f(x, y) = x^3 − 3 x y^2.

fx = 3 x^2 − 3 y^2, fy = −6 x y
Setting both to zero gives x = 0 and y = 0 (shared solution), plus lines x = ± y.
Hessian: f{xx} = 6 x, f{yy} = −6 x, f_{xy} = −6 y.
At (0,0): D = 6·0 · (−6·0) − (−6·0)^2 = 0 → inconclusive. Testing nearby points reveals a saddle; graphing confirms.

Problem L: f(x, y) = x^2 + 4 y^2 − 4 x + 8.

f_x = 2 x − 4 = 0 → x = 2
f_y = 8 y = 0 → y = 0
Hessian: f{xx} = 2, f{yy} = 8, f_{xy} = 0
D = 2·8 − 0 = 16 > 0 and f_{xx} > 0 → local minimum at (2, 0) with value 4 + 0 − 8 + 8 = 4

With enough practice, I read the quadratic terms and spot convexity before computing anything. That intuition is priceless when optimizing models in code.

Practice Set 5: Applied Mini‑Scenarios

Tying partial derivatives to real work makes the exercises memorable.

Scenario 1: Gradient of a 2D Loss Function

Loss L(w1, w2) = (w1 − 3)^2 + 2 (w2 + 1)^2. I often ask new teammates to compute gradients by hand before trusting autodiff.

∂L/∂w1 = 2 (w1 − 3)
∂L/∂w2 = 4 (w2 + 1)
Setting both to zero gives optimum at (3, −1). The Hessian is diagonal with positive entries, confirming convexity.

Scenario 2: Surface Normal for Shading

Height field h(x, y) = 0.2 x^2 + 0.3 y^2. In real‑time rendering, the normal vector at (x, y) is proportional to (−hx, −hy, 1).

hx = 0.4 x, hy = 0.6 y → normal = (−0.4 x, −0.6 y, 1), then normalize.

Scenario 3: Economic Cost Function

Cost C(q1, q2) = 50 + 2 q1^2 + 3 q2^2 − 4 q1 q2. Marginal costs:

∂C/∂q1 = 4 q1 − 4 q2
∂C/∂q2 = 6 q2 − 4 q1
Solving yields the production balance q1 = (3/2) q2. Plugging into one equation gives a family of optimal trade‑offs; adding capacity constraints nails the exact point.

Scenario 4: Regularization Check

Regularized loss J(w1, w2) = data(w1, w2) + λ (w1^2 + w2^2). Gradient picks up 2 λ w_i terms. When λ changes during training schedules, this term shows up immediately in the gradient; partial derivatives make that obvious.

Scenario 5: Linearization for Sensor Fusion

Measurement model z = g(x, y) = x y + sin x. For an extended Kalman filter step, I need the Jacobian: [∂g/∂x, ∂g/∂y] = [y + cos x, x]. Plugging the latest state estimate gives the update matrix; no autodiff in embedded C, so manual partials rule.

New Section: The “Freeze‑and‑Focus” Mental Model

Here’s the micro‑routine I teach when someone is stuck:

1) Circle the variable you’re differentiating with respect to.

2) Treat every other symbol as a constant—literally pretend it’s a number.

3) Differentiate as if you’re in single‑variable calculus.

4) Un‑freeze and simplify.

This sounds obvious, but it’s what keeps errors from multiplying. In practice, I say out loud: “Differentiate with respect to x; y is a constant today.” It keeps my brain from wandering into total derivative territory.

A quick demonstration helps:

f(x, y) = (x + y)^4.
f_x = 4(x + y)^3 · 1.
f_y = 4(x + y)^3 · 1.

Both look identical because the inner expression treats x and y symmetrically. If you see asymmetry in the answers, something went wrong.

New Section: Edge Cases That Break Patterns

Partial derivatives are mostly routine, but there are a few edge cases that I deliberately practice because they create brittle mistakes in production code.

1) Domain Restrictions

If you differentiate without checking the domain, you can end up with gradients that don’t exist. For f(x, y) = ln(x − y), the function is only defined when x − y > 0. I always annotate that beside my work.

2) Non‑Differentiability

For f(x, y) =

+ y^2, f_x is not defined at x = 0. In optimization, this matters; the gradient is undefined, but subgradients may exist. If you implement a custom loss, make sure your optimizer can handle that corner.

3) Piecewise Definitions

Consider f(x, y) = x y if x ≥ 0, and f(x, y) = −x y if x < 0. The partials flip sign across x = 0. For machine learning, this is a common pattern in hinge or ReLU‑like losses. You can still practice partial derivatives here, but you must respect the piecewise cases.

4) Implicit Functions

If a constraint is given, like x^2 + y^2 = 1, you can’t treat x and y as independent. You might need dy/dx instead, or use Lagrange multipliers. This is the most common “wrong method” I see when people try to differentiate constrained systems.

5) Mixed Partial Symmetry Failures

Clairaut’s theorem requires sufficient smoothness. If a function has a kink, f{xy} and f{yx} might not match. That’s rare in standard textbooks but common in real‑world losses (think absolute values or max functions).

New Section: Alternative Solution Paths and Sanity Checks

I almost never accept a gradient without at least one alternate check. Here are three low‑effort approaches:

1) Plug‑in numeric check

Pick a random point, evaluate a small finite difference, and compare with your derivative. If f_x(1,2) ≈ 10, you can check using f(1.001,2) − f(1,2) ≈ 0.01. The sign and approximate magnitude should line up.

2) Symmetry checks

If the original function is symmetric in x and y, the derivatives should look symmetric too. This is a free guardrail.

3) Dimension checks

In physical models, units can catch mistakes. If f is in meters and x is in seconds, f_x should be meters/second. If your expression has meters/second^2, you probably differentiated the wrong variable.

New Section: When to Use Partial Derivatives vs. Something Else

Partial derivatives are perfect for local sensitivity, but they are not always the right tool.

Use partial derivatives when:

You want a local rate of change in one direction.
You are doing gradient‑based optimization.
You are building tangent plane approximations.
You need a Jacobian or Hessian for a local model.

Don’t rely only on partial derivatives when:

You care about global behavior (use contour plots, level sets, or global optimization).
The function is noisy or non‑smooth (consider subgradients or smoothing).
You are in a constrained system (use Lagrange multipliers or implicit differentiation).
You need directional derivatives along a specific path (use the total derivative or gradient dot direction).

This distinction matters in engineering. For example, a loss might have flat regions where gradients vanish; in that case, partial derivatives say “no change,” but the global landscape still has useful structure.

New Section: Practice Set 6 — Chain Rule and Composition

These problems target the most error‑prone area: nested functions.

Problem M: f(x, y) = sin(x^2 + y^2).

f_x = cos(x^2 + y^2) · 2x
f_y = cos(x^2 + y^2) · 2y

Problem N: f(x, y) = e^{x^2 y}.

f_x = e^{x^2 y} · (2x y)
f_y = e^{x^2 y} · x^2

Problem O: f(x, y) = ln(1 + x^2 y).

f_x = (2x y)/(1 + x^2 y)
f_y = (x^2)/(1 + x^2 y)

Problem P: f(x, y) = (x y + 1)^3.

f_x = 3(x y + 1)^2 · y
f_y = 3(x y + 1)^2 · x

The key is to isolate the inner function and apply the chain rule systematically. If you find yourself expanding unnecessarily, pause and re‑wrap the function into a clean inner expression.

New Section: Practice Set 7 — Directional Derivatives and Gradients

Partial derivatives are slopes along the coordinate axes, but real‑world changes happen in arbitrary directions. The directional derivative connects the two.

Problem Q: f(x, y) = x^2 y. Find the directional derivative at (1, 2) in the direction of the vector v = (3, 4).

Gradient: ∇f = (2 x y, x^2)
At (1,2): ∇f = (4, 1)
Unit vector u = v/| v
= (3/5, 4/5)
Directional derivative: ∇f · u = 4·(3/5) + 1·(4/5) = (12 + 4)/5 = 16/5

This is the moment where I remind learners that partial derivatives are just one slice of the more general directional derivative.

New Section: Practice Set 8 — Lagrange Multipliers (Light Intro)

If you’re optimizing with a constraint, partial derivatives alone are not enough. Here’s a minimal practice task to show the workflow.

Problem R: Maximize f(x, y) = x y subject to x^2 + y^2 = 1.

Set up: ∇f = λ ∇g where g(x, y) = x^2 + y^2
∇f = (y, x), ∇g = (2x, 2y)
Solve y = 2λx and x = 2λy
This implies y = 2λx and x = 2λy → x = 4λ^2 x
If x ≠ 0, then 1 = 4λ^2 → λ = ±1/2
Then y = 2λx gives y = ±x
With x^2 + y^2 = 1 → 2x^2 = 1 → x = ±1/√2
Max occurs at (1/√2, 1/√2) and (−1/√2, −1/√2)

This is a simple bridge into constrained optimization, which is everywhere in control and ML.

New Section: Common Mistakes I See (and How I Fix Them)

Dropping a variable too early: In f(x, y) = x e^{x y}, some students treat y as constant then forget it reappears via chain rule. I write a tiny note “y is constant but lives inside exponent.”
Mixing total and partial derivatives: When time is involved, I remind myself that ∂ treats other variables as frozen, while d/dt walks along a path x(t), y(t).
Skipping domain checks for logs: f(x, y) = ln(x y) only makes sense when x y > 0. During practice, I annotate domain beside the function.
Sign errors in product rule: I rewrite the product rule each time for a week; muscle memory forms quickly.
Forgetting symmetry of mixed partials: I compute both f{xy} and f{yx} once; if they mismatch, that flags an earlier mistake.
Over‑expanding too early: Expanding (x + y)^4 into five terms invites arithmetic errors. Apply the chain rule first, simplify later.
Treating piecewise functions as smooth: If there’s an absolute value or max, I explicitly split cases.

A Compact Table for Quick Recall

Pattern

Example

———

—–

Polynomial

x^2 y + y^3

2 x y

x^2 + 3 y^2

Log of product

ln(x y)

1/x

1/y

Exponential with product

e^{x y}

y e^{x y}

x e^{x y}

Sum inside trig

sin(x + y)

cos(x + y)

Quadratic form

x^2 + x y + y^2

2 x + y

x + 2 yI keep this table near my tablet when whiteboarding; it’s faster than searching notes.

New Section: Comparison Table — Manual vs. Autodiff vs. Numerical

Sometimes the right approach depends on speed, trust, and context. Here’s how I compare them in my own workflow:

Approach

Strengths

Weaknesses

Best Use

———-

———–

————

———-

Manual derivation

Maximum clarity and insight

Prone to algebra mistakes

Small models, interviews, teaching

Autodiff (JAX/PyTorch)

Accurate for complex graphs

Can hide conceptual errors

Research code, large models

Numerical finite differences

Easy sanity check

Sensitive to step size

Quick validation testsI never pick just one. In practice I do manual first, then autodiff, and if I’m still uneasy, a quick finite‑difference check at a random point.

Modern Tooling Habits (2026 Perspective)

In my day‑to‑day, I still compute simple partials by hand, but I lean on a few tools responsibly:

CAS sanity checks: SymPy (Python) and the latest on‑device CAS in VS Code can confirm gradients instantly. I treat them like tests, not crutches.
JAX/PyTorch gradients: For research code, I sometimes write the forward function and let autodiff confirm my manual result. When the two disagree, I trust the manual derivation until I find the bug.
Notebook templates: I keep a template that plots surfaces and tangent planes using Plotly; seeing the geometry reduces errors.
Unit tests for math: For reusable utilities (e.g., custom loss terms), I write pytest cases that compare analytical gradients to numerical ones with small finite differences (eps = 1e−5). If the relative error stays below 1e−6, I ship it.

A practical performance note: numerical checks get expensive when a function has many parameters. I only sample a few directions or a handful of parameter indices per test run. That gives a strong signal without inflating runtime.

New Section: Performance and Stability Considerations

Partial derivatives show up in real systems, so I think about performance and numerical stability too.

1) Avoid catastrophic cancellation

If your derivative subtracts nearly equal numbers, floating‑point error can spike. In those cases, algebraic simplification before computing can help.

2) Choose finite‑difference step size wisely

Too small, and rounding error dominates; too big, and approximation error dominates. I prefer ranges like 1e−4 to 1e−6 depending on the scale of the inputs.

3) Scale inputs

If x is around 1e6, your gradient might explode numerically even if the math is fine. Normalize inputs before differentiation if you can.

4) Batch gradients

In ML systems, computing partials in batches is vastly more efficient than looping element by element. Vectorization doesn’t change the calculus, but it changes the runtime cost.

Full Practice Walkthrough: One Function, Many Angles

Take f(x, y) = x^2 y e^{y} + ln(x + y). Here’s how I explore it.

1) First‑order partials

f_x = 2 x y e^{y} + 1/(x + y)
f_y = x^2 e^{y} (y + 1) + 1/(x + y)

2) Second‑order partials

f_{xx} = 2 y e^{y} − 1/(x + y)^2
f_{yy} = x^2 e^{y} (y + 2) − 1/(x + y)^2
f_{xy} = 2 x e^{y} (y + 1) − 1/(x + y)^2

3) Domain: x + y > 0 to keep the log valid.

4) Tangent plane at (x0, y0) = (1, 1)

z0 = f(1, 1) = 1^2 ·1·e + ln 2 ≈ e + 0.693
f_x(1,1) = 2·1·1·e + 1/2 ≈ 2 e + 0.5
f_y(1,1) = 1^2·e·2 + 1/2 ≈ 2 e + 0.5
Plane: z − z0 = (2 e + 0.5)(x − 1) + (2 e + 0.5)(y − 1)

5) Hessian at (1,1)

f_{xx}(1,1) = 2·1·e − 1/4 ≈ 2 e − 0.25
f_{yy}(1,1) = 1^2·e·3 − 1/4 ≈ 3 e − 0.25
f_{xy}(1,1) = 2·1·e·2 − 1/4 ≈ 4 e − 0.25
Determinant D ≈ (2 e − 0.25)(3 e − 0.25) − (4 e − 0.25)^2. Plugging e ≈ 2.718 gives a negative D, signaling a saddle near (1,1).

This single function exercises polynomial, exponential, and logarithmic rules plus curvature analysis—great one‑stop practice.

Self‑Test Worksheet (Answer Key Included)

Try these without peeking, then check yourself.

Q1: f(x, y) = 3 x^2 y − 4 y^2. Compute fx and fy.

f_x = 6 x y
f_y = 3 x^2 − 8 y

Q2: f(x, y) = ln(x y). Compute fx and fy; state the domain.

fx = 1/x, fy = 1/y, domain x y > 0

Q3: f(x, y) = x^2 e^{y}. Compute f{xx}, f{yy}, f_{xy}.

f_{xx} = 2 e^{y}
f_{yy} = x^2 e^{y}
f_{xy} = 2 x e^{y}

Q4: f(x, y) = x^3 + y^3 − 3 x y. List all first and second partials.

f_x = 3 x^2 − 3 y
f_y = 3 y^2 − 3 x
f_{xx} = 6 x
f_{yy} = 6 y
f{xy} = f{yx} = −3

Q5: z = x^2 + y^2 at (1, 1, 2). Find the tangent plane.

z = 2 x + 2 y − 2

Q6: f(x, y) = x e^{x y}. Compute fx and fy.

f_x = e^{x y} (1 + x y)
f_y = x^2 e^{x y}

If you miss any, redo the derivation slowly; speed comes after accuracy.

New Section: A Mini‑Project to Make It Stick

If you want a practical capstone without spending days on it, try this:

1) Choose a bivariate function from your work or hobbies (graphics, economics, physics).

2) Compute fx and fy by hand.

3) Plot the surface and the gradient field.

4) Pick a point and compute the tangent plane.

5) Verify your derivative numerically at three random points.

I like to use a simple height field and then test how the normal vector affects shading. The immediate visual feedback cements the calculus.

Habit Loop for Retaining This Skill

Daily: Solve two first‑order problems and one mixed‑partial problem in under five minutes.
Weekly: Derive one tangent plane and classify one critical point using the Hessian test.
Monthly: Code a quick gradient checker in your favorite language and compare analytical vs. numerical gradients for a custom function you wrote that month.
Visual: Plot one surface with contours and its gradient field; seeing arrows align with steepest ascent reinforces the meaning of the gradient vector.

If you ever feel rusty, reset to the warm‑ups. Consistency is more valuable than volume.

Key Takeaways and Next Steps

Partial derivatives are not an abstract requirement from a bygone math class—they are the workhorse behind gradients in neural networks, normals in rendering, and sensitivity analyses in economics. The fastest way I’ve found to stay sharp is to keep a rotating set of practice problems, verify with light tooling, and connect each derivative to a tangible scenario: a tangent plane you can picture, a loss surface you actually optimize, or a physical surface you might fabricate. You now have a 20‑minute routine (warm‑up problems, one Hessian classification, one applied scenario) that fits between meetings. Pick a tool—SymPy, JAX, or even a minimal numerical checker—to confirm your algebra, but keep the manual skill active. If you want to stretch further, swap x and y for real parameters in your current project and recompute the gradients; relevance keeps the practice sticky. I recommend scheduling your next run‑through right now; future you will thank you when the next tricky gradient pops up at work.