Archive

Python

The sample Program from the previous post not only demonstrated some OpenGL calls and concepts, but also illustrates that code organization quickly becomes important as soon as the program grows beyond a couple of lines.

So I refactored some of the OpenGL Interface code into classes. I noted that for each draw call I need to setup the OpenGL context by binding some Objects like the shader program, the target framebuffer or the source texture.

I also learned that after you are done using any OpenGL object it is helpful to unbind it. The reason for this is not that they would use too much resources, but rather decoupling. It might be that your program just coincidentally has the right objects bound but as soon as you move any of the code, something will break. Explicitly unbinding helps avoid these dependencies of different parts of your program on what GL objects other parts might have bound last.

So my drawing operations started to look like this

# Bind needed objects 
framebuffer.bind()
render_program.use()
texture.bind()

gl.glDrawSomething(...)

# Unbind Objects
texture.unbind()
render_program.end_use()
framebuffer.unbind()

Just by renaming the matching method pairs .bind() and .unbind() to __enter__ and __exit__ the objects become context managers and the code above becomes much cleaner

with framebufer, render_program, texture:
    gl.glDrawSomething(...)

This style has the advantage of being able to specify the objects used for a draw call in a single with: line. On the other hand it also removes flexibility on the bind and unbind times of the objects. I think in most cases this is not a disadvantage because most times many objects depend on each other logically (e.g. use this texture with that shader together).

A refactored version of the last posts example which uses the context managers is available on github.

This post will show you how to set up a framebuffer in OpenGL using Python and Pyglet.

See my previous post for some tips how to use OpenGL functions from Python.

Structure of the Program

As this is a demo program I did not structure it with python classes. However the complexity is already big enough to see that this flat structure is not suitable for any larger program.

The following illustrates the drawing steps. The OpenGL objects in the grey boxes are globals
in my python program.
Flowchart for the Draw Event

There are two steps: first a triangle is rendered to a texture using a framebuffer, and second the texture is copied to the screen. Each step uses

  • A Shader Program containing the GLSL code used for rendering
  • A Vertex Buffer containing vertex data for the objects being drawn
  • A Vertex Array Object containing the information how the data
    in the buffer is linked to the vertex attributes in the shader program

The full program is available on github.

Scene description

In this example the framebuffer is used to render the triangle from the previous post in a very low resolution (30×20 px). This Image is then used for texturing two rectangles in the main window.

Note that both in the framebuffer and on the main screen I do not use any vertex transformation. Therefore only the x and y coordinates are used. The lower left corner of the screen has coordinates (-1, -1) and the upper right corner (1, 1). Texture coordinates however run from 0 to 1.

framebuffer

Vertex Array Objects

In the previous program there was only one GLSL program and only one vertex buffer. Therefore it was convenient to store the vertex attribute bindings in the default OpenGL state. In this program however we switch back and forth between the triangle rendering program which uses color attributes for the vertices and the copy to screen program which uses texture coordinates.

A vertex array object stores the information how data in a buffer is linked to vertex attributes of the shader program. So I can define this connection once (using glVertexAttribPointer) and then I only need to select the correct vertex array object when drawing.

def setup_render_vertexbuffer():
    ...
    gl.glBindVertexArray(render_vao)

    gl.glEnableVertexAttribArray(loc_position)
    gl.glEnableVertexAttribArray(loc_color)

    # the following bindings will be remembered by the vao
    gl.glBindBuffer(gl.GL_ARRAY_BUFFER, render_vertexbuffer)

    gl.glVertexAttribPointer(loc_position, 2, gl.GL_FLOAT, False, 
         ctypes.sizeof(COLOR_VERTEX), 
         ctypes.c_void_p(COLOR_VERTEX.position.offset))
    gl.glVertexAttribPointer(loc_color, 4, gl.GL_FLOAT, False, 
         ctypes.sizeof(COLOR_VERTEX), 
         ctypes.c_void_p(COLOR_VERTEX.color.offset))
    gl.glBindVertexArray(0)


def render_to_texture():
    ...
    # draw using the vertex array for vertex information
    gl.glBindVertexArray(render_vao)
    gl.glDrawArrays(gl.GL_TRIANGLES, 0, 3)
    gl.glBindVertexArray(0)

Gotcha: The Viewport

OpenGL needs to know the size of the rendering target. In this program there are two targets, so you need to tell OpenGL the new size every time you switch the drawing target.

def render_to_texture():
    gl.glBindFramebuffer(gl.GL_FRAMEBUFFER, framebuffer)
    gl.glViewport(0, 0, FB_WIDTH, FB_HEIGHT)
    ...

def copy_texture_to_screen():
    gl.glBindFramebuffer(gl.GL_FRAMEBUFFER, 0)
    gl.glViewport(0, 0, window.width, window.height)
    ...

Putting it together

The program has 300 lines, and as I mentioned before, it could benefit well from more structure and also from deduplication of code. Setting up two GLSL shader programs, two vertex array objects, two vertex buffers and two drawing steps has lead to some duplication, which could be avoided if these were all each two instances of a shader class, a vertex array class and a vertex buffer class.

def draw():
    render_to_texture()
    copy_texture_to_screen()

def main():
    global window
    window = pyglet.window.Window()

    setup_framebuffer()
    setup_render_program()
    setup_render_vertexbuffer()
    setup_copy_program()
    setup_copy_vertexbuffer()

    window.on_draw = draw
    pyglet.clock.schedule_interval(lambda dt:None, 0.01)
    pyglet.app.run()

Try out the complete program. You can get it on github.
Please comment below if something does not work or what I could explain better.

Calling some OpenGL functions from Python with pyglet can be a bit tricky due to fact that the functions are only thin wrappers around the C-API.

My sample program demonstrates some calls for using modern OpenGL with pyglet. The program is intentionally quite flat, without any classes. In a larger program I would manage the involved OpenGL objects with Python classes.

triangle in a program window

This is not a full tutorial, but some aspects of the sample program are explained below.

Pyglet

I used Pyglet because it includes an OpenGL binding and supports Python 3.

pyglet provides an object-oriented programming interface for developing games and other visually-rich applications for Windows, Mac OS X and Linux.

Call by Reference

Many OpenGL calls use a call by reference / pointer for returning values. These can be called using ctypes.byref.

The example first creates a GLuint initialized to zero and then passes a pointer to glGenBuffers.

from pyglet import gl
import ctypes

vertexbuffer = gl.GLuint(0)
gl.glGenBuffers(1, ctypes.byref(vertexbuffer))

Structs for Vertex Attributes

Using a struct for passing vertex attributes can take advantage of the meta-information provided by ctypes. It is also easier to manage than a flat array of float values. In the example the vertex shader takes a vec2 and a vec4 as attributes.

Note that as in the example above I use the type aliases provided by pyglet.gl in order to ensure compatible types with OpenGL.

In ctypes you create an array by multiplying the type object.

class VERTEX(ctypes.Structure):
    _fields_ = [
        ('position', gl.GLfloat * 2),
        ('color', gl.GLfloat * 4),
    ]

The structure fields are given an offset attribute which can be used for passing to glVertexAttribPointer

gl.glVertexAttribPointer(loc_position, 2, gl.GL_FLOAT, False,
                         ctypes.sizeof(VERTEX),
                         ctypes.c_void_p(VERTEX.position.offset))
gl.glVertexAttribPointer(loc_color, 4, gl.GL_FLOAT, False,
                         ctypes.sizeof(VERTEX), 
                         ctypes.c_void_p(VERTEX.color.offset))

And the structure is initialized very easily; here I create an array of three vertices, each containing a 2D position and a 4D color.

data = (VERTEX * 3)(((-0.6, -0.5), (1.0, 0.0, 0.0, 1.0)),
                    (( 0.6, -0.5), (0.0, 1.0, 0.0, 1.0)),
                    (( 0.0,  0.5), (0.0, 0.0, 1.0, 1.0)))
gl.glBufferData(gl.GL_ARRAY_BUFFER, ctypes.sizeof(data), 
                data, gl.GL_DYNAMIC_DRAW)

String Arguments

OpenGL expects C-style strings, so it is easiest to use byte strings.
ctypes has string buffers which can tranlate between bytes in Python and char* in C.

loc_position = gl.glGetAttribLocation(program,
                      ctypes.create_string_buffer(b'position'))

They can also be used for retrieving strings from OpenGL such as the log of a shader compilation.

length = gl.GLint(0)
gl.glGetShaderiv(shader_name, gl.GL_INFO_LOG_LENGTH, ctypes.byref(length))
log_buffer = ctypes.create_string_buffer(length.value)
gl.glGetShaderInfoLog(shader_name, length, None, log_buffer)

Finding out about getting the compilation log really helped me when writing my own shaders.

The code for passing the shader source to OpenGL is still somewhat messy with a ctypes cast. I would be glad if you can suggest a better alternative in the comments.

src_buffer = ctypes.create_string_buffer(shader_source)
buf_pointer = ctypes.cast(ctypes.pointer(ctypes.pointer(src_buffer)),
                          ctypes.POINTER(ctypes.POINTER(ctypes.c_char)))
length = ctypes.c_int(len(shader_source) + 1)
gl.glShaderSource(shader_name, 1, buf_pointer, ctypes.byref(length))

In this post I will show how coverage.py can help you write complete tests and how to write a test for a case that should not ever occur.

Meanwhile I am adding more bytecodes to my interpreter, so if you are interested in my implementation of python bytecodes take a look at https://github.com/leovt/interp/blob/master/interp.py.

In my last post i described why and how I switched to unittest testing. Running the tests after every code change makes it much more comfortable and I can be reasonably confident that the cange did not break what was working before.

But there is a nagging question: Do I really test all possible paths in my code? Did I forget to test some exceptional code path? The answer to this question is given by Ned Batchelders coverage.py

This script will record which lines of code are executed and then produce a nice report.

Installation

I am using ubuntu, so I could install it simply with

sudo apt-get install python-coverage

Running the coverage tool

First clear the results of previous runs

leonhard@leovaio:~/interp$ python-coverage erase

Run the unittest with the coverage tool

leonhard@leovaio:~/interp$ python-coverage run interp_test.py -b
...s........
----------------------------------------------------------------------
Ran 12 tests in 0.019s

OK (skipped=1)

Then run coverage again to produce a report.

leonhard@leovaio:~/interp$ python-coverage report -m
Name                                     Stmts   Miss  Cover   Missing
----------------------------------------------------------------------
/usr/share/pyshared/coverage/collector     132    127     4%   3-229, 236-244, 248-292
/usr/share/pyshared/coverage/control       236    235     1%   3-355, 358-624
/usr/share/pyshared/coverage/execfile       35     14    60%   3-17, 42-43, 48, 58-65
interp                                     182      1    99%   269
interp_test                                103      8    92%   44-52, 122
----------------------------------------------------------------------
TOTAL                                      688    384    44%   

The line that interests me is the one for interp.py

interp                                     182      1    99%   269

Apparently my tests miss one line, line 269 in interp.py. This line is throwing the exception, when an unknown bytecode is found. How can I write a test that executes this line?

I could simply run my interpreter with bytecodes that I have not yet implemented, but this would invalidate the test when I add those bytecodes later. Therefore I want to test with a bytecode that does not exist in Python, e.g. bytecode 255.

Obvisually I cannot create a code object with this code in it just by definig a function. I use a technique called mocking which takes advantage of duck-typing in Python. I don’t really need to pass a code object to my interpreter, any object with the right attributes will do. There are libraries that help creating mock objects, but for this simple use case I will just roll my own:

        class Mock(object):
            co_code = '\xff\x00\x00'
            co_nlocals = 0

I can then verify that this illegal bytecode is acutally causing line 269 to be executed and the exeption being raised by adding the unittest

    def test_unknown_opcode(self):
        class Mock(object):
            co_code = '\xff\x00\x00'
            co_nlocals = 0

        with self.assertRaises(interp.InterpError):
            interp.execute(Mock(), {})

By the way the coverage tool can also produce a very nice html report hilighting with colors the covered and not covered lines in a python file.

In this post i introduce the unittest module and set up a few basic tests for the code I have developed so far. While setting up the new testing sure enough i were surprised by an error and had to restructure one test. While the new test works, I do not think the last test case is very good, so suggestions are welcome …

So far I have added a little test function or two in the top of the interp module and changed it for every new feature I wanted to implement in the interpreter.

This has the advantage of being able to quickly try things out while writing interp.py, but it has important disatvantages:

  • Tests are in same file as “production” code.
  • I lose all the previous tests.
  • No standard way to regularly run the test.

In order to improve the testing, I am going to use the unittest module of the Python standard library.

unittest provides the TestCase class which you subclass to add your own test methods. Further there is a TestSuite class which allows you to organize and run your test cases. For the moment I am not going to use TestSuite instances but will let unittest.main take care of providing a simple command line interface for my tests.

if __name__ == '__main__':
    unittest.main()

Will provide the command line interface

~/interp$ python interp_test.py -b
...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK

The -b switch suppresses stdout for successful tests.
There is also -v switch which will cause the test runner to name each test.

My test from the first post is implemented like this, and the one for the for loop is very similar. All tests I have so far just test that the returned value of the test function is the same in Python and in my own interpreter.

class TestRetrunValues(unittest.TestCase):
    """
    These tests are verifying that the return values
    of some functions are the same in native Python and
    in my interpreter.
    """
    def test_basic_operations(self):
        """
        Testing the first example
        """
        def test():
            a = 2
            b = a + 4
            return (a + 1) * (b - 2)

        expected = test()
        realized = interp.execute(test.func_code, {})

        self.assertEqual(expected, realized)

When I copied the test for calling a function into the test module I got surprisingly an error. Surprisingly because the same code has worked when it was directly in the interp.py.

    def test_call(self):
        """
        test from third post: calling a function
        """
        def test():
            return square(4) + square(3)

        def square(n):
            return n * n

        expected = test()
        realized = interp.execute(test.func_code, test.func_globals)

        self.assertEqual(expected, realized)
======================================================================
ERROR: test_call (__main__.TestRetrunValues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "interp_test.py", line 50, in test_call
    realized = interp.execute(test.func_code, test.func_globals)
  File "/home/leonhard/interp/interp.py", line 148, in execute
    raise Exception('Unknown Opcode %d (%s)' % (bc, dis.opname[bc]))
Exception: Unknown Opcode 136 (LOAD_DEREF)

----------------------------------------------------------------------

What happend, and why is it different from when the test was in the interp.py file?

The difference is that now the test and square functions are not defined in the global scope. Therefore the test function, when looking for square, will not search for it in the global namespace but in its closure. I will not go into this topic more right now, what counts is that for this test to succeed I will first need to extend my interpreter more. So for the moment I am going to re-write this test case.

At the same time I do not want to lose this testcase, but I do not want to have my testing fail for this. unittest provides a decorator, @skip, which will disable the test method with a message. There are also more sophisticated, conditional skips available.

    @unittest.skip('closures (e.g. LOAD_DEREF) not yet implemented')
    def test_call(self):

In verbose mode unittest will now display the skip message and the overall test result is “OK (skipped=1)”

For implementing the amended test case I define the executed functions on module level. Actually I am not really satisfied with this solution as the test case is now split onto different parts of the interp_test.py file. If anyone could propose a better way to define this test I would appreciate much your comment!

def _call_global():
    '''
    A test function for the following unittest. 
    Must be defined in module level
    '''
    return _square(4) + _square(3)

def _square(n):
    return n * n

class TestRetrunValues(unittest.TestCase):
    def test_call_global(self):
        """
        test from third post: calling a function, at module level
        """
        test = _call_global
        expected = test()
        realized = interp.execute(test.func_code, test.func_globals)

        self.assertEqual(expected, realized)

Edit: Sometimes the answer is just too obvious: if I want square to be treated as a global, i can simply use the global keyword, of course:

    def test_call(self):
        global square
        def test():
            return square(4) + square(3)

        def square(n):
            return n * n
        [...]

You can have a look at interp_test.py developed in this post on github.

This post is again about the Python bytecode interpreter I have started in the three previous posts. This time I won’t develop the interpreter any further but rather start to set up an environment for this little project.

Until now I have just had a single test case in my main script, and as you could see if you had followed the download links I the three versions of the script were just saved as three different files, interp_1.py, interp_2.py, and interp_3.py.
This manual version control is maybe feasible as long as I have only one file and as long as I am the only one developing, but already under these conditions a finer versioning would be desirable.

I could use a local version control system, but as I want to publish the code anyway, a distributed, public version control would be the better suited tool. I am new to distributed version control systems, and have quickly looked at git and mercurial. For both there are public repositories available: GitHub and BitBucket.

I found this question on StackOverflow about the differences of git and mercurial and I think for this project it does not matter much which system I will use. I went for git / GitHub because I think it can be lower level and if want to try out more complicated things later, I would already have this as a real example. But for the moment I will probably use only basic features.

I have created leovt/interp on GitHub and have already checked in the files from the previous posts.
Each version posted previously is now a tag in the GitHub repository.

All I needed to do for setting up this repository is clearly explained on the GitHub website.

Lets extend the interpreter with the capability to call into a function. Unlike the builtin function range in the last post, I would like also the called function to be interpreted in my interpreter and not just being executed in the host Python.

def test():
    return square(4) + square(3)

def square(n):
    return n * n

In order to simplify the testing I do not want to adapt the globals directory every time I change the test function. Fortunately every function keeps a reference to the global environment it is defined in, so I will just tell the execute function to use the appropriate globals of the test function.

print 'own execute function:', execute(test.func_code, test.func_globals)

The called function will need its own space to run in, e.g. to store the local variables. Also the program counter for the called function must not overwrite the program counter in the calling function. Now it comes handy that all the execution state is stored in a frame object: we can simply create a clean new frame for the called function.
Then we simply point our variable f, holding the executed frame to the newly created frame for transferring control to the called function.

elif bc==CALL_FUNCTION:
    [...]
    elif type(callee) is types.FunctionType:
        subframe = Frame(callee.func_code, callee.func_globals)
        subframe.locals[:arg] = call_args
        f = subframe

Note that the functions arguments are just copied into the locals of the called function. In fact (ignoring any * or ** magic) the functions arguments are always the first local variables.

Lets try to execute the test:

normal Python call: 25
own execute function: 16

What happened? Apparently we see only the result of the first function call square(4). The interpreter has just quit too early – upon seeing the first return statement.

Of course when we are calling into functions we also need to transfer control back to the calling function when the called function returns. At the moment we just lose the calling frame when we transfer control to the subframe. For being able to return control to the caller we first need to keep the calling frame. We will give the frame object a reference to the caller.

class Frame(object):
    def __init__(self, code, globals, caller):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.globals = globals
        self.code = code
        self.caller = caller

The first function executed by our interpreter is not called from inside the interpreter, so for this first frame we set the caller to None.

f = Frame(code, globals, None)

In the CALL_FUNCTION implementation we keep the reference to the calling frame and we also update the calling frames program counter to point to the next instruction before transferring control, so everything will be ready when we decide to return.

elif bc==CALL_FUNCTION:
    [...]
    elif type(callee) is types.FunctionType:
        subframe = Frame(callee.func_code, callee.func_globals, f)
        subframe.locals[:arg] = call_args
        f.PC += 3
        f = subframe

Now everything is ready for the return statement to be implemented in the RETURN_VALUE bytecode. For this we check if we have a calling frame and if so we push the returned value on the calling frame and transfer control back.

elif bc==RETURN_VALUE:
    if f.caller is None:
        return f.stack.pop()
    else:
        ret = f.stack.pop()
        f = f.caller
        f.stack.append(ret)

And finally we get the desired result.

normal Python call: 25
own execute function: 25

You can download the code of the updated interpreter.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

A note on the value stack:
In my implementation each frame has got its own stack, so a called function has no chance of corrupting the callers stack. If the called function guarantees not to alter the bottom of the stack and also to pop off all values it pushed, then the same stack could be shared between the caller and the callee.
In this hobby / educational implementation I prefer the seperate approach as it is probably more robust and easier to debug.

I continue building my interpreter by making it capable of running a test function which uses some new features.

def test():
    for i in range(10):
        print i

This little test function makes it necessary to support four new concepts:

  • global variables
  • calling a built-in function
  • the for loop
  • printing values

Global Variables

So far the code only used local variables. They are only needed during execution of the code object and are stored in the execution frame. In Python global (or more exactly module-level) variables are stored in a dictionary (you can see this dictionary using the globals() built-in). So before we can load global variables into our value stack, we need to keep a reference to the relevant globals dictionary. Therefore I extend the frame object to hold this reference.

Also we need to somehow initialize the globals of our execution environment, therefore the execute function gets a new parameter.

class Frame(object):
    def __init__(self, code, globals):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.globals = globals
        self.code = code

def execute(code, globals):
    f = Frame(code, globals)
    ...

my_globals = {'range': range}
print 'own execute function:', execute(test.func_code, my_globals)

You can see that I initialized the globals with the minimum needed to execute our test function, the reference to the built-in range.

The access to the globals is made by the LOAD_GLOBAL bytecode, whose argument is an index into the co_names field which contains the names of the globals used in the function.

        elif bc==LOAD_GLOBAL:
            arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
            f.stack.append(f.globals[f.code.co_names[arg]])
            f.PC += 3

Calling a Built-in Function

The relevant bytecode CALL_FUNCTION is going to be quite complex to implement.
For the moment I implement only the simplest case: calling a built-in function without any
* or ** magic arguments.

For a function call the stack need to look like this: The function arguments lie on top of the function object. The first (leftmost) argument is lowest, the rightmost function argument is on the top of the stack. The byte at PC+1 contains the number of arguments used in the function call.

So first we pop the function arguments off the stack, pop off the function itself and
push the result of the function call (which happens outside of our execution environment) back on the stack.

elif bc==CALL_FUNCTION:
    arg = ord(f.code.co_code[f.PC+1])

    call_args = [None] * arg
    for i in range(arg-1, -1, -1):
        call_args[i] = f.stack.pop()

    callee = f.stack.pop()
    if type(callee) is types.BuiltinFunctionType:
        f.stack.append(callee(*call_args))
        f.PC += 3

The For Loop

The for loop in python uses the iterator protocol. Essentially it works like this: for each loop try to get the next value of the iterator. Once the iterator is exhausted it raises the StopIteration exception. The for loop swallows this exception and continues with the next statement after the loop.
At the end of the loop body a jump is made back to the head which will then continue with the next iteration or end the loop.

When you look at the bytecode yo will see the SETUP_LOOP and the POP_BLOCK instructions. I have not yet figured what they do exactly, and in this simple example they are not needed, so I implement them as no-operations.

elif bc==SETUP_LOOP:
    f.PC += 3
elif bc==POP_BLOCK:
    f.PC += 1

The jump at the end of the loop is the simplest jump possible, an unconditional absolute jump.
It will just set the next instruction to the address provided as an argument to the bytecode.

elif bc==JUMP_ABSOLUTE:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.PC = arg

The for loop in python needs an iterator. Python gets an iterator for the object provided to the for loop by the GET_ITER bytecode. Now i implement it with the built_in iter() function of the host python, later I may want to make my implementation more independent and use __magic__ methods instead.

elif bc==GET_ITER:
    f.stack.append(iter(f.stack.pop()))
    f.PC += 1

The most interesting bytecode for this post is finally the FOR_ITER. It can either push the next value of the iterator or jump to the end of the loop. Note that the jump is relative to the bytecode after FOR_ITER.

elif bc==FOR_ITER:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    try:
        nx = next(f.stack[-1])
    except StopIteration:
        f.stack.pop()
        f.PC += arg + 3
    else:
        f.stack.append(nx)
        f.PC += 3

Printing

Python prints the top of stack using the PRINT_VALUE bytecode, and the final newline using the PRINT_NEWLINE code. Again I simply use the functionality of the host Python to implement these two bytecodes.

elif bc==PRINT_ITEM:
    print f.stack.pop(),
    f.PC += 1
elif bc==PRINT_NEWLINE:
    print
    f.PC += 1

Conclusion

As I wrote in the introduction post, I took a lot of advantage of the host Python in which I implement my own Python interpreter.

You can download the source code of the extended interpreter.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

In this post I am going to write a byte-code interpreter capable of executing the bytecode of this simple function.
I am using Python 2.7.1, the bytecode can be different for different versions of Python. The aim is not to exactly reproduce the way CPython works in every detail, but to understand the concepts.

def test():
    a = 2
    b = a + 4
    return (a + 1) * (b - 2)

CPython uses a stack-based bytecode. This means that inputs and intermediary results for evaluating an expression are pushed on a stack. Most operations in the bytecode will operate on the top one or two items on the stack. In my last post i demonstrated how you can look at the bytecode of test().

The bytecode for the first line looks like this: 0x64 0x0100 0x7D 0x0000
Which is interpreted as

LOAD_CONST(1) # push(code.co_consts[1])
STORE_FAST(0) # locals[0] = pop()

The interpreter will need to process each bytecode (line in the disassembly) sequentially. It will need to keep track of the current instruction to be executed. This is done using a variable which keeps the index of the current instruction in the code. I call this index PC for program counter.

Then it will also need to keep track of the values of all local variables. The code object has the field co_nlocals which tells us how many local variables we need to reserve space for. So far the names of the local variables do not matter, the locals are accessed by a number in the bytecode.

And of course the interpreter needs the stack where all the calculations and shoveling around of values happen. For implementing the stack I will use an ordinary list, .append() for pushing an item, [-1] for accessing the top of stack and .pop() for popping the top of stack.

Even though this is not needed now, I define a frame object to hold all these objects defining the state of the interpreter, as well as a reference to the code object being executed. This will prove useful when we will be implementing function calls in the future. Right now all it does is initialize the PC, reserve space for the locals, and create an empty stack.

class Frame(object):
    def __init__(self, code):
        self.locals = [None] * code.co_nlocals
        self.PC = 0
        self.stack = []
        self.code = code

The core part of the interpreter is a loop executing one bytecode after another. At the beginning of each loop the variable bc is assigned the next opcode to be performed, i.e. the code at the position PC. Note that I use the frame object to hold all the state of the interpreter. In the complete source you can see that I obtain the numeric values for the various opcodes from the dis module.

def execute(code):
    f = Frame(code)

    while True:
        bc = ord(f.code.co_code[f.PC])
        if bc==LOAD_FAST:
            # [...]
        elif bc==LOAD_CONST:
            # [...]
        else:
            raise Exception('Unknown Opcode %d (%s)' % (bc, dis.opname[bc]))

An exception is raised in case an opcode is encountered, that I did not implement. That way it is easy to see what is still missing when applying the interpreter to more and more complex test functions.

Let’s now look into the implementation of each opcode:

LOAD_FAST and LOAD_CONST are pretty similar. Both take a two-byte argument stored after the opcode at positions PC+1 and PC+2. The arguments is an index into either the locals (stored in the frame) or the constants (stored in the code object). The value is pushed onto the stack and then the program counter is advanced to the next opcode. Both instructions are three bytes wide.

if bc==LOAD_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.stack.append(f.locals[arg])
    f.PC += 3

elif bc==LOAD_CONST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.stack.append(f.code.co_consts[arg])
    f.PC += 3

Writing to a local variable is also pretty similar. The value written is popped off the stack.

elif bc==STORE_FAST:
    arg = ord(f.code.co_code[f.PC+1]) + 256*ord(f.code.co_code[f.PC+2])
    f.locals[arg] = f.stack.pop()
    f.PC += 3

The binary operations pop two items off the stack and push the result. Be careful about the order in which the operands are expected on the stack. Because these instructions take no argument, the PC needs only to be incremented by one. (I only show subtraction, addition and multiplication work analogously.)

elif bc==BINARY_SUBTRACT:
    b = f.stack.pop()
    a = f.stack.pop()
    f.stack.append(a-b)
    f.PC += 1

Finally the result of the function needs to be returned to the caller. The RETURN_VALUE instruction expects the value to be returned on the top of the stack. So far there is only one function, the top-level test(). Therefore we can halt the interpreter when reaching the return instruction. The value returned from the test function is returned to the caller of the interpreter in the “normal” python world.

elif bc==RETURN_VALUE:
    return f.stack.pop()

In the future the interpreter is going to implement function calls and we will have to revise the implementation of RETURN_VALUE as it will no longer be the end of execution.

That’s it! Lets test our implementation, both the CPython interpreter on the first line and the newly developed execute() need to give the same result: 12

print 'normal Python call:', test()
print 'own execute function:', execute(test.func_code)

Download the source code developped in this post.

Edit: GitHub
I have created a GitHub repository for this project. See the revision for this post on GitHub.

This is part of my series about writing a Python virtual machine in python.

The python source code is compiled into bytecode which is contained in code objects. In addition to the bytecode they contain information about the variables and constants contained in the code.

Code objects can be accessed most easily from functions. Accessing the code of modules is harder, because they are executed when the module is imported and the code is not needed later any more. Python functions have the attribute func_code which exposes the code object implementing the function code.

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def test(a, b=5):
...     'docstring of test'
...     z = abs(a - 15)
...     return z - b
...
>>> test.func_code
<code object test at 0xb77b8e78, file "<stdin>", line 1>
>>> test.func_code.co_consts
('docstring of test', 15)
>>> test.func_code.co_varnames
('a', 'b', 'z')
>>> test.func_code.co_names
('abs',)
>>> test.func_code.co_code
't\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00}\x02\x00|\x02\x00|\x01\x00\x18S'
>>> ' '.join('%02X' % ord(c) for c in test.func_code.co_code)
'74 00 00 7C 00 00 64 01 00 18 83 01 00 7D 02 00 7C 02 00 7C 01 00 18 53'

Note that most (or all?) of this is implementation detail, so you should not rely on it to be the same in other implementations of python or even future versions of CPython. I do not intend to document the code objects completely here, I encourage you to explore. I found some quick descriptions of the fields in the documentation of the inspect module.

We can see that the constants used in the code – including the docstring – are stored in the co_consts field, the names external to the code (e.g. the global abs) in the co_names and the local names in the co_varnames attribute.

The bytecode in the co_code attribute is a byte-string which is conveniently analyzed using the dis module. The documentation of the dis module is also valuable for getting a quick glance at the various bytecodes.

>>> import dis
>>> dis.dis(test.func_code)
.  3           0 LOAD_GLOBAL              0 (abs)
.              3 LOAD_FAST                0 (a)
.              6 LOAD_CONST               1 (15)
.              9 BINARY_SUBTRACT
.             10 CALL_FUNCTION            1
.             13 STORE_FAST               2 (z)
.
.  4          16 LOAD_FAST                2 (z)
.             19 LOAD_FAST                1 (b)
.             22 BINARY_SUBTRACT
.             23 RETURN_VALUE

For this simple test function it is pretty easy to make the connection between the bytecode and the source code. The numbers to the left (3 and 4) are the line numbers in the source code. The numbers next to the mnemonics are the offsets into the byte code. We can see that some bytecodes use one byte (e.g. BINARY_SUBSTRACT), while others use three bytes, taking a two-byte argument. The disassembler conveniently annotates the argument with the item it is referencing. For example the code at offset 6 (LOAD_CONST) takes an argument (1) which is an index into the co_consts field of the code object.

Note that the default argument (5) is not stored in the code object. Also for the global abs only the name is stored, but no reference to the abs function. These links to external objects are made in the function object and not in the code object.

Design a site like this with WordPress.com
Get started