Chained assignment in Python bytecode

Table of contents

Chained assignment has a footgun🔗

In Python, chained assignment has a subtle (though well-known) footgun. The following function returns True:

def example():
    a = b = []  # <-- oops!
    a.append(1)  # b gets modified as well
    return b == [1]  # True

I've known of this behavior for a long time, but every once in a while it catches me unaware. The problem is that in the line a = b = [], a Python list object is constructed once and assigned to both variables. When a.append(1) is called, the underlying list is modified, and b points back to that common underlying list. This is a bug if your intent was for a and b to refer to separate lists.

The most recent time I shot myself in the foot with this bug, I got curious: I know what happens at a semantic level when I write a = b = [], but what happens at the bytecode level? A quick Google search led me to the dis module, which allows you to inspect disassembled CPython bytecode.

Inspecting the bytecode🔗

Let's write a simple program in a new file chained-assignment-example.py using the dis module...

import dis

def example():
    a = b = []

dis.dis(example)

...and then run python chained-assignment-example.py on it. On my machine, running Python 3.12.6, the output is:

  3           0 RESUME                   0

  4           2 BUILD_LIST               0
              4 COPY                     1
              6 STORE_FAST               0 (a)
              8 STORE_FAST               1 (b)
             10 RETURN_CONST             0 (None)

Understanding the bytecode🔗

At this point, we're fairly close to understanding what chained assignment looks like at the bytecode level. A few important concepts we'll need to know:

Let's go over the instructions for line 4 (a = b = []) one-by-one.

BUILD_LIST N pops N items from the stack, turns them into a list, and then pushes a C pointer to the resultant list onto the evaluation stack (see code). To precise, it creates a PyListObject, casts it to a PyObject, and then returns it (see code). In this case, we have BUILD_LIST 0 which creates an empty list and pushes it onto the stack.

COPY N copies the N-th last item from the stack and pushes it onto the stack. In this example we have COPY 1 which means we copy the 1-th last item (i.e. the item at the top, the reference to the list) and push it to the top of the stack. So we now have two references to the same list on the stack.

STORE_FAST N pops the stack and stores the popped value into the N-th varname. In this example we have STORE_FAST 0. What's the 0-th varname? Recall that example.__code__.co_varnames was (a, b). So the 0-th varname is just a, which the dis.dis() function has already helpfully identified for us!

So now a refers to the newly created list.

The next instruction is STORE 1. Again, we pop the stack but this time we store the popped value — which is a pointer to the above list — in the 1-th varname, which is the variable b.

So now b refers to the same list object as before.

Doing it the right way🔗

The crux of our analysis is that in the compiled bytecode for the example() function there was only a single BUILD_LIST instruction. That means we only allocated a single PyListObject on the heap, which the Python VM then dutifully assigned to two different variable names. A recipe for subtle and annoying behavior that might cause someone to waste 10 minutes of their time. Not that I'd know anything about that.

What if we didn't use chained assignment? As above, let's write some code and inspect its disassembled bytecode. In a new file regular-assignment.py:

import dis

def example():
    a = []
    b = []

dis.dis(example)

Running python regular-assignment.py, we get:

  3           0 RESUME                   0

  4           2 BUILD_LIST               0
              4 STORE_FAST               0 (a)

  5           6 BUILD_LIST               0
              8 STORE_FAST               1 (b)
             10 RETURN_CONST             0 (None)

There are two distinct BUILD_LIST instructions in the output bytecode, each followed by a STORE_FAST instructions! Two distinct list objects will be heap-allocated, each referred to by two different variable names.

References🔗