Image by Author | CanvaIn this article, we’ll break down the basics of profiling and optimizing python code. It’s easy to write something that works, but it’s a whole different game when your code needs to be fast and efficient.
Python is great for getting things done quickly, but it’s not always the fastest language when you’re dealing with heavy loops, slow I/O operations, large files, or recursive functions. These things can quietly drag down performance, especially when you’re working with data or building something for production.
This tutorial will show you how to spot what’s slowing your code down and what you can do about it. You’ll learn how to measure performance using simple tools, analyze memory and CPU usage, and clean up those bottlenecks with smart (and practical) tweaks.
By the end, you’ll not only write code that works, but code that runs smoother and faster.
What Exactly Is Code Profiling?
Code profiling is all about figuring out where your code is spending most of its time or resources. Instead of guessing which part is slow, profiling gives you hard numbers, how long each function takes to run, how many times it’s called, and how much memory it uses.
Think of it like getting a report card for your script. You’ll see which lines or functions are dragging things down and which ones are running smoothly.
In Python, you can profile your code in a few different ways:
- CPU profiling: Measures how much time each part of your code takes
- Memory profiling: Tracks how much memory is being used and where
- Line-by-line profiling: Zooms in and shows performance stats for each line of a function
Profiling Code with cProfile
cProfile is a built-in Python module that gives you a detailed breakdown of how much time each part of your code takes to run. It’s fast, reliable, and doesn’t need any installation. Let’s walk through how to use it.
Before we profile anything, let’s create a simple script that does some work, like sorting, sleeping, and looping, to simulate something real.
import time
def slow_function():
total = 0
for i in range(10000):
for j in range(100):
total += i * j
time.sleep(1)
return total
def fast_function():
return sum([i for i in range(1000)])
def main():
slow_function()
fast_function()
if __name__ == "__main__":
main()
This script has two functions:
- slow_function() does some nested loops and sleeps for 1 second
- fast_function() just sums up numbers quickly
- main() runs them both
Running cProfile on the Script
Now let’s profile the entire script using cProfile from the command line:
python -m cProfile sample_script.py
You’ll get an output like this:
20027 function calls in 1.234 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.002 0.002 1.234 1.234 sample_script.py:7(slow_function) 1 0.000 0.000 1.234 1.234 sample_script.py:14(main) 1 0.000 0.000 0.000 0.000 sample_script.py:12(fast_function)
Here’s a quick breakdown of the key columns:
- ncalls: Number of times a function was called
- tottime: Time spent only in that function (excluding sub-functions)
- cumtime: Time spent in that function and everything it calls
From the output, you can clearly see slow_function is eating up most of the time. That’s exactly what we wanted to find out!
Line-by-Line Performance Check
While cProfile gives a nice overview of function-level performance, sometimes you need to know exactly which lines are slow. That’s where line_profiler comes in. It shows how much time each line of your function takes.
Installing line_profiler
Unlike cProfile, this one isn’t built-in, so you’ll need to install it first:
pip install line_profiler
You’ll need to:
- Decorate the functions you want to analyze with @profile
- Run your script with kernprof
Let’s modify our script slightly:
import time
@profile
def slow_function():
total = 0
for i in range(10000):
for j in range(100):
total += i * j
time.sleep(1)
return total
@profile
def fast_function():
return sum([i for i in range(1000)])
def main():
slow_function()
fast_function()
if __name__ == "__main__":
main()
Note: Make sure to save this in a file (e.g., profile_script.py) and not run it directly inside an IDE for kernprof to work properly.
Running the Line Profiler
Use kernprof to profile the script:
kernprof -l -v profile_script.py
- -l tells it to use the line profiler
- -v makes it show the results right after it runs
You’ll get something like this:
Timer unit: 1e-06 s
Total time: 1.56 s
File: profile_script.py
Function: slow_function at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 1 5.0 5.0 0.0 total = 0
6 10000 104567.0 10.5 85.0 for i in range(10000):
7 1000000 13456.0 0.01 8.6 for j in range(100):
8 1000000 10589.0 0.01 6.3 total += i * j
9 1 1000000.0 1000000.0 0.1 time.sleep(1)
You can now see how much time each individual line took. For example, the nested loops are the biggest time sink.
Simple Optimization Techniques
Now that you’ve seen how to identify bottlenecks, the next step is fixing them. Optimization doesn’t always mean rewriting everything from scratch; sometimes, small tweaks can give big speed boosts.
1. Replace Slow Loops with Built-in Functions
Python’s built-in functions are usually written in C under the hood, meaning they’re much faster than writing your own loops.
Example: Summing Numbers
Before (slow):
def sum_manual():
total = 0
for i in range(1000000):
total += i
return total
After (faster):
def sum_builtin():
return sum(range(1000000))
The built-in sum() is optimized; it avoids the overhead of Python loops. This one change can make your code 5–10x faster, depending on what you’re doing.
2. Use List Comprehensions Instead of Loops
Using list comprehensions is not just cleaner, it is also faster.
Before
squares = []
for i in range(10000):
squares.append(i * i)
After:
squares = [i * i for i in range(10000)]
List comprehensions are implemented in C and run faster than manually appending items inside a loop.
3. Avoid Repeated Calculations
Don’t calculate something twice if you can store it once.
Before:
def slow_calc(n):
return [pow(2, i) for i in range(n) if pow(2, i) % 2 == 0]
After:
def fast_calc(n):
results = []
for i in range(n):
val = pow(2, i)
if val % 2 == 0:
results.append(val)
return results
We moved pow(2, i) into a variable so we’re not doing the same computation twice. This small change saves a lot of time, especially in larger datasets.
Great! Now let’s bring everything together with a real-world demo.
Let’s Profile and Optimize a Real Example
We’ll walk through a small project: calculating prime numbers up to a certain limit. This is a common example, but perfect for showing how to profile and improve slow code.
Step 1: Start with a Basic (but Slow) Version
Let’s define a very basic function that finds prime numbers:
def is_prime(n):
if n < 2:
return False
for i in range(2, n):
if n % i == 0:
return False
return True
def find_primes(limit):
primes = []
for num in range(2, limit):
if is_prime(num):
primes.append(num)
return primes
Explanation:
- is_prime() checks whether a number is prime
- find_primes() runs that check on every number up to the limit
This works fine, but it’s slow for large values like 100,000.
Step 2: Profile the Code with cProfile
Now we’ll use Python’s built-in profiler to spot the slow parts.
import cProfile
cProfile.run('find_primes(10000)')
You’ll probably see that most of the time is spent in is_prime(), especially in the inner loop.
Step 3: Optimize the Code
Here are a few improvements we can make: Instead of checking all numbers up to n, check only up to the square root of n. Skip even numbers (except 2) since they can’t be prime.
Here’s the optimized version:
import math
def is_prime_fast(n):
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, int(math.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
def find_primes_fast(limit):
primes = []
for num in range(2, limit):
if is_prime_fast(num):
primes.append(num)
return primes
Step 4: Compare with timeit
Let’s see how much faster the optimized version is:
import timeit
# Original version
print("Original:", timeit.timeit('find_primes(10000)', globals=globals(), number=1))
# Optimized version
print("Optimized:", timeit.timeit('find_primes_fast(10000)', globals=globals(), number=1))
Conclusion
In this tutorial, we walked through the basics of profiling and optimizing Python code. We talked about common performance issues like slow loops and expensive function calls, and we explored tools like cProfile, line_profiler, and timeit to help pinpoint what’s slowing things down.
We also built a real-world example, which is a prime number calculator, and showed how simple math-based improvements can significantly speed up your code.
At the end of the day, profiling isn’t about guessing what might be slow, it’s about measuring and fixing it with purpose.
If you enjoyed this, try applying profiling to one of your own projects. You’ll be surprised what a few small tweaks can do.
Want to go further? Check out:
- Py-Spy: A sampling profiler that doesn’t require changing your code
- Scalene: A high-precision CPU, GPU, and memory profiler for Python
- line_profiler docs
