The Basics of Code Profiling and Optimization in Python

Image by Author | Canva

In this article, we’ll break down the basics of profiling and optimizing python code. It’s easy to write something that works, but it’s a whole different game when your code needs to be fast and efficient.

Python is great for getting things done quickly, but it’s not always the fastest language when you’re dealing with heavy loops, slow I/O operations, large files, or recursive functions. These things can quietly drag down performance, especially when you’re working with data or building something for production.

This tutorial will show you how to spot what’s slowing your code down and what you can do about it. You’ll learn how to measure performance using simple tools, analyze memory and CPU usage, and clean up those bottlenecks with smart (and practical) tweaks.

By the end, you’ll not only write code that works, but code that runs smoother and faster.

What Exactly Is Code Profiling?

Code profiling is all about figuring out where your code is spending most of its time or resources. Instead of guessing which part is slow, profiling gives you hard numbers, how long each function takes to run, how many times it’s called, and how much memory it uses.

Think of it like getting a report card for your script. You’ll see which lines or functions are dragging things down and which ones are running smoothly.

In Python, you can profile your code in a few different ways:

CPU profiling: Measures how much time each part of your code takes
Memory profiling: Tracks how much memory is being used and where
Line-by-line profiling: Zooms in and shows performance stats for each line of a function

Profiling Code with cProfile

cProfile is a built-in Python module that gives you a detailed breakdown of how much time each part of your code takes to run. It’s fast, reliable, and doesn’t need any installation. Let’s walk through how to use it.

Before we profile anything, let’s create a simple script that does some work, like sorting, sleeping, and looping, to simulate something real.

import time

def slow_function():
    total = 0
    for i in range(10000):
        for j in range(100):
            total += i * j
    time.sleep(1)
    return total

def fast_function():
    return sum([i for i in range(1000)])

def main():
    slow_function()
    fast_function()

if __name__ == "__main__":
    main()

This script has two functions:

slow_function() does some nested loops and sleeps for 1 second
fast_function() just sums up numbers quickly
main() runs them both

Running cProfile on the Script

Now let’s profile the entire script using cProfile from the command line:

python -m cProfile sample_script.py

You’ll get an output like this:

20027 function calls in 1.234 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1    0.002    0.002    1.234    1.234 sample_script.py:7(slow_function)
  1    0.000    0.000    1.234    1.234 sample_script.py:14(main)
  1    0.000    0.000    0.000    0.000 sample_script.py:12(fast_function)

Here’s a quick breakdown of the key columns:

ncalls: Number of times a function was called
tottime: Time spent only in that function (excluding sub-functions)
cumtime: Time spent in that function and everything it calls

From the output, you can clearly see slow_function is eating up most of the time. That’s exactly what we wanted to find out!

Line-by-Line Performance Check

While cProfile gives a nice overview of function-level performance, sometimes you need to know exactly which lines are slow. That’s where line_profiler comes in. It shows how much time each line of your function takes.

Installing line_profiler

Unlike cProfile, this one isn’t built-in, so you’ll need to install it first:

pip install line_profiler

You’ll need to:

Decorate the functions you want to analyze with @profile
Run your script with kernprof

Let’s modify our script slightly:

import time

@profile
def slow_function():
    total = 0
    for i in range(10000):
        for j in range(100):
            total += i * j
    time.sleep(1)
    return total

@profile
def fast_function():
    return sum([i for i in range(1000)])

def main():
    slow_function()
    fast_function()

if __name__ == "__main__":
    main()

Note: Make sure to save this in a file (e.g., profile_script.py) and not run it directly inside an IDE for kernprof to work properly.

Running the Line Profiler

Use kernprof to profile the script:

kernprof -l -v profile_script.py

-l tells it to use the line profiler
-v makes it show the results right after it runs

You’ll get something like this:

Timer unit: 1e-06 s

Total time: 1.56 s
File: profile_script.py
Function: slow_function at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5         1          5.0      5.0      0.0    total = 0
     6     10000      104567.0     10.5     85.0    for i in range(10000):
     7   1000000       13456.0      0.01     8.6        for j in range(100):
     8   1000000       10589.0      0.01     6.3            total += i * j
     9         1     1000000.0 1000000.0     0.1    time.sleep(1)

You can now see how much time each individual line took. For example, the nested loops are the biggest time sink.

Simple Optimization Techniques

Now that you’ve seen how to identify bottlenecks, the next step is fixing them. Optimization doesn’t always mean rewriting everything from scratch; sometimes, small tweaks can give big speed boosts.

1. Replace Slow Loops with Built-in Functions

Python’s built-in functions are usually written in C under the hood, meaning they’re much faster than writing your own loops.

Example: Summing Numbers
Before (slow):

def sum_manual():
    total = 0
    for i in range(1000000):
        total += i
    return total

After (faster):

def sum_builtin():
    return sum(range(1000000))

The built-in sum() is optimized; it avoids the overhead of Python loops. This one change can make your code 5–10x faster, depending on what you’re doing.

2. Use List Comprehensions Instead of Loops

Using list comprehensions is not just cleaner, it is also faster.

Before

squares = []
for i in range(10000):
    squares.append(i * i)

After:

squares = [i * i for i in range(10000)]

List comprehensions are implemented in C and run faster than manually appending items inside a loop.

3. Avoid Repeated Calculations

Don’t calculate something twice if you can store it once.

Before:

def slow_calc(n):
    return [pow(2, i) for i in range(n) if pow(2, i) % 2 == 0]

After:

def fast_calc(n):
    results = []
    for i in range(n):
        val = pow(2, i)
        if val % 2 == 0:
            results.append(val)
    return results

We moved pow(2, i) into a variable so we’re not doing the same computation twice. This small change saves a lot of time, especially in larger datasets.

Great! Now let’s bring everything together with a real-world demo.

Let’s Profile and Optimize a Real Example

We’ll walk through a small project: calculating prime numbers up to a certain limit. This is a common example, but perfect for showing how to profile and improve slow code.

Step 1: Start with a Basic (but Slow) Version

Let’s define a very basic function that finds prime numbers:

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

def find_primes(limit):
    primes = []
    for num in range(2, limit):
        if is_prime(num):
            primes.append(num)
    return primes

Explanation:

is_prime() checks whether a number is prime
find_primes() runs that check on every number up to the limit

This works fine, but it’s slow for large values like 100,000.

Step 2: Profile the Code with cProfile

Now we’ll use Python’s built-in profiler to spot the slow parts.

import cProfile

cProfile.run('find_primes(10000)')

You’ll probably see that most of the time is spent in is_prime(), especially in the inner loop.

Step 3: Optimize the Code

Here are a few improvements we can make: Instead of checking all numbers up to n, check only up to the square root of n. Skip even numbers (except 2) since they can’t be prime.

Here’s the optimized version:

import math

def is_prime_fast(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(math.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False
    return True

def find_primes_fast(limit):
    primes = []
    for num in range(2, limit):
        if is_prime_fast(num):
            primes.append(num)
    return primes

Step 4: Compare with timeit

Let’s see how much faster the optimized version is:

import timeit

# Original version
print("Original:", timeit.timeit('find_primes(10000)', globals=globals(), number=1))

# Optimized version
print("Optimized:", timeit.timeit('find_primes_fast(10000)', globals=globals(), number=1))

Conclusion

In this tutorial, we walked through the basics of profiling and optimizing Python code. We talked about common performance issues like slow loops and expensive function calls, and we explored tools like cProfile, line_profiler, and timeit to help pinpoint what’s slowing things down.

We also built a real-world example, which is a prime number calculator, and showed how simple math-based improvements can significantly speed up your code.

At the end of the day, profiling isn’t about guessing what might be slow, it’s about measuring and fixing it with purpose.

If you enjoyed this, try applying profiling to one of your own projects. You’ll be surprised what a few small tweaks can do.

Want to go further? Check out:

Py-Spy: A sampling profiler that doesn’t require changing your code
Scalene: A high-precision CPU, GPU, and memory profiler for Python
line_profiler docs