Category Archives: programming

Robot control part 6: Handling singularities

We’re back! Another exciting post about robotic control theory, but don’t worry, it’s short and ends with simulation code. The subject of today’s post is handling singularities.

What is a singularity

This came up recently when I had build this beautiful controller for a simple two link arm that would occasionally go nuts. After looking at it for a while it became obvious this was happening whenever the elbow angle reached or got close to 0 or \pi. Here’s an animation:

singularity

What’s going on here? Here’s what. The Jacobian has dropped rank and become singular (i.e. non-invertible), and when we try to calculate our mass matrix for operational space

\textbf{M}_\textbf{x}(\textbf{q}) = (\textbf{J} (\textbf{q}) \; \textbf{M}^{-1} (\textbf{q}) \; \textbf{J}^T(\textbf{q}))^{-1},

the values explode in the inverse calculation. Dropping rank means that the rows of the Jacobian are no longer linearly independent, which means that the matrix can be rotated such that it gives a matrix with a row of zeros. This row of zeros is the degenerate direction, and the problems come from trying to send forces in that direction.

To determine when the Jacobian becomes singular its determinant can be examined; if the determinant of the matrix is zero, then it is singular. Looking the Jacobian for the end-effector:

\textbf{J}(\textbf{q}) = \left[ \begin{array}{cc} -L_0 sin(q_0) - L_1 sin(q_0 + q_1) & -L_1 sin(q_0 + q_1) \\ L_0 cos(q_0) + L_1 cos(q_0 + q_1) & L_1 cos(q_0 + q_1) \end{array} \right].

When q_1 = 0 it can be that sin(q_0 + 0) = sin(q_0), so the Jacobian becomes

\textbf{J}(\textbf{q}) = \left[ \begin{array}{cc} (L_0 + L_1)(-sin(q_0)) & -L_1 sin(q_0) \\ (L_0 + L_1) cos(q_0) & L_1 cos(q_0) \end{array} \right],

which gives a determinant of

(L_0 + L_1)(-sin(q_0))(L_1)(cos(q_0)) - (L_1)(-sin(q_0))(L_0 + L_1)(cos(q_0)) = 0.

Similarly, when q_1 = \pi, where sin(q_0 + \pi) = -sin(q_0) and cos(q_0 + \pi) = -cos(q_0), the Jacobian is

\textbf{J}(\textbf{q}) = \left[ \begin{array}{cc} -(L_0 - L_1) sin(q_0) & L_1 sin(q_0) \\ (L_0 + L_1) cos(q_0) & - L_1 cos(q_0) \end{array} \right].

Calculating the determinant of this we get

(L_0 + L_1)(-sin(q_0))(L_1)(-cos(q_0)) - (L_1)(sin(q_0))(L_0 + L_1)(-cos(q_0)) = 0.

Note that while in these cases the Jacobian is a square matrix in the event that it is not a square matrix, the determinant of \textbf{J}(\textbf{q})\;\textbf{J}^T(\textbf{q}) can be found instead.

Fixing the problem

When a singularity is occurring it can be detected, but now it must be handled such that the controller behaves appropriately. This can be done by identifying the degenerate dimensions and setting the force in those directions to zero.

First the SVD decomposition of \textbf{M}_\textbf{x}^{-1}(\textbf{q}) = \textbf{V}\textbf{S}\textbf{U}^T is found. To get the inverse of this matrix (i.e. to find \textbf{M}_\textbf{x}(\textbf{q})) from the returned \textbf{V}, \textbf{S} and \textbf{U} matrices is a matter of inverting the matrix \textbf{S}:

\textbf{M}_\textbf{x}(\textbf{q}) = \textbf{V} \textbf{S}^{-1} \textbf{U}^T,

where \textbf{S} is a diagonal matrix of singular values.

Because \textbf{S} is diagonal it is very easy to find its inverse, which is calculated by taking the reciprocal of each of the diagonal elements.

Whenever the system approaches a singularity some of the values of \textbf{S} will start to get very small, and when we take the reciprocal of them we start getting huge numbers, which is where the value explosion comes from. Instead of allowing this to happen, a check for approaching the singularity can be implemented, which then sets the singular values entries smaller than the threshold equal to zero, canceling out any forces that would be sent in that direction.

Here’s the code:

Mx_inv = np.dot(JEE, np.dot(np.linalg.inv(Mq), JEE.T))
if abs(np.linalg.det(np.dot(JEE,JEE.T))) > .005**2:
    # if we're not near a singularity
    Mx = np.linalg.inv(Mx_inv)
else:
    # in the case that the robot is entering near singularity
    u,s,v = np.linalg.svd(Mx_inv)
    for i in range(len(s)):
        if s[i] < .005: s[i] = 0
        else: s[i] = 1.0/float(s[i])
    Mx = np.dot(v, np.dot(np.diag(s), u.T))

And here’s an animation of the controlled arm now that we’ve accounted for movement when near singular configurations:

fixed

As always, the code for this can be found up on my Github. The default is to run using a two link arm simulator written in Python. To run, simply download everything and run the run_this.py file.

Everything is also included required to run the MapleSim arm simulator. To do this, go into the TwoLinkArm folder, and run python setup.py build_ext -i. This should compile the arm simulation to a shared object library that Python can now access on your system. To use it, edit the run_this.py file to import from TwoLinkArm/arm_python to TwoLinkArm/arm and you should be good to go!
More details on getting the MapleSim arm to run can be found in this post.

Tagged , , , , ,

Arm simulation visualization with Matplotlib

One of the downsides of switching to Python from Matlab is that it can be a pain to plot some kinds of things, and I’ve found animations to be one those things. In previous posts I’ve done the visualization of my arm simulations through Pyglet, but I recently started playing around with Matplotlib’s animation function, and the results are pretty smooth. The process is also relatively painless and quick to get up and running, so I thought I would throw up some Matplotlib code for visualizing my previously described 2 link arm MapleSim simulation.

So, let’s look at the code:

#Written by Travis DeWolf (Sept, 2013)
#Based on code by Jake Vanderplas - http://jakevdp.github.com

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import py2LinkArm

class TwoLinkArm:
    """
    :param list u: the torque applied to each joints
    """
    def __init__(self, u = [.1, 0]): 
        self.u = np.asarray(u, dtype='float') # control signal
        self.state = np.zeros(3) # vector for current state
        self.L1=0.37 # length of arm link 1 in m
        self.L2=0.27 # length of arm link 2 in m
        self.time_elapsed = 0

        self.sim = py2LinkArm.pySim()
        self.sim.reset(self.state)
    
    def position(self):
        """Compute x,y position of the hand"""

        x = np.cumsum([0,
                       self.L1 * np.cos(self.state[1]),
                       self.L2 * np.cos(self.state[2])])
        y = np.cumsum([0,
                       self.L1 * np.sin(self.state[1]),
                       self.L2 * np.sin(self.state[2])])
        return (x, y)

    def step(self, dt):
        """Simulate the system and update the state"""
        for i in range(1500):
            self.sim.step(self.state, self.u)
        self.time_elapsed = self.state[0]

#------------------------------------------------------------
# set up initial state and global variables
arm = TwoLinkArm()
dt = 1./30 # 30 fps

#------------------------------------------------------------
# set up figure and animation
fig = plt.figure(figsize=(4,4))
ax = fig.add_subplot(111, aspect='equal', autoscale_on=False,
                     xlim=(-1, 1), ylim=(-1, 1))
ax.grid()

line, = ax.plot([], [], 'o-', lw=4, mew=5)
time_text = ax.text(0.02, 0.95, '', transform=ax.transAxes)

def init():
    """initialize animation"""
    line.set_data([], [])
    time_text.set_text('')
    return line, time_text

def animate(i):
    """perform animation step"""
    global arm, dt
    arm.step(dt)
    
    line.set_data(*arm.position())
    time_text.set_text('time = %.2f' % arm.time_elapsed)
    return line, time_text

# frames=None for matplotlib 1.3
ani = animation.FuncAnimation(fig, animate, frames=None,
                              interval=25, blit=True, 
                              init_func=init)

# uncomment the following line to save the video in mp4 format.
# requires either mencoder or ffmpeg to be installed
#ani.save('2linkarm.mp4', fps=15, 
#         extra_args=['-vcodec', 'libx264'])

plt.show()

There’s not all too much to it, which is nice. I’ve created a class TwoLinkArm that wraps the actual arm simulator, stores the current state of the arm, and has a step function that gets called to move the system forward in time. Then, I created a line and stored a reference to it; this is what is going to be updated in the simulation. I then need functions that specify how the line will be initialized and updated. The init function doesn’t do anything except set the line and text data to nothing, and then the animate function calls the arm simulation’s step function and updates the line and text data.

For more details about it there’s this blog post which steps through the process a bit more. For simple arm simulations the above is all I need though, so I’ll leave it there for now!

Here’s an animation of the resulting sim visualization, I’ve removed a lot of the frames to keep the size down. It’s smoother when running the actual code, which can be found up on my github.

Hurray better visualization!

Tagged , , ,

Reinforcement Learning part 2: SARSA vs Q-learning

In my previous post about reinforcement learning I talked about Q-learning, and how that works in the context of a cat vs mouse game. I mentioned in this post that there are a number of other methods of reinforcement learning aside from Q-learning, and today I’ll talk about another one of them: SARSA. All the code used is from Terry Stewart’s RL code repository, and can be found both there and in a minimalist version on my own github: SARSA vs Qlearn cliff. To run the code, simply execute the cliff_Q or the cliff_S files.

SARSA stands for State-Action-Reward-State-Action. In SARSA, the agent starts in state 1, performs action 1, and gets a reward (reward 1). Now, it’s in state 2 and performs another action (action 2) and gets the reward from this state (reward 2) before it goes back and updates the value of action 1 performed in state 1. In contrast, in Q-learning the agent starts in state 1, performs action 1 and gets a reward (reward 1), and then looks and sees what the maximum possible reward for an action is in state 2, and uses that to update the action value of performing action 1 in state 1. So the difference is in the way the future reward is found. In Q-learning it’s simply the highest possible action that can be taken from state 2, and in SARSA it’s the value of the actual action that was taken.

This means that SARSA takes into account the control policy by which the agent is moving, and incorporates that into its update of action values, where Q-learning simply assumes that an optimal policy is being followed. This difference can be a little difficult conceptually to tease out at first but with an example will hopefully become clear.

Mouse vs cliff

Let’s look at a simple scenario, a mouse is trying to get to a piece of cheese. Additionally, there is a cliff in the map that must be avoided, or the mouse falls, gets a negative reward, and has to start back at the beginning. The simulation looks something like exactly like this:

mouse vs cliff
where the black is the edge of the map (walls), the red is the cliff area, the blue is the mouse and the green is the cheese. As mentioned and linked to above, the code for all of these examples can be found on my github (as a side note: when using the github code remember that you can press the page-up and page-down buttons to speed up and slow down the rate of simulation!)

Now, as we all remember, in the basic Q-learning control policy the action to take is chosen by having the highest action value. However, there is also a chance that some random action will be chosen; this is the built-in exploration mechanism of the agent. This means that even if we see this scenario:

mouse and cliff
There is a chance that that mouse is going to say ‘yes I see the best move, but…the hell with it’ and jump over the edge! All in the name of exploration. This becomes a problem, because if the mouse was following an optimal control strategy, it would simply run right along the edge of the cliff all the way over to the cheese and grab it. Q-learning assumes that the mouse is following the optimal control strategy, so the action values will converge such that the best path is along the cliff. Here’s an animation of the result of running the Q-learning code for a long time:


The solution that the mouse ends up with is running along the edge of the cliff and occasionally jumping off and plummeting to its death.

However, if the agent’s actual control strategy is taken into account when learning, something very different happens. Here is the result of the mouse learning to find its way to the cheese using SARSA:


Why, that’s much better! The mouse has learned that from time to time it does really foolish things, so the best path is not to run along the edge of the cliff straight to the cheese but to get far away from the cliff and then work its way over safely. As you can see, even if a random action is chosen there is little chance of it resulting in death.

Learning action values with SARSA

So now we know how SARSA determines it’s updates to the action values. It’s a very minor difference between the SARSA and Q-learning implementations, but it causes a profound effect.

Here is the Q-learning learn method:

def learn(self, state1, action1, reward, state2):
    maxqnew = max([self.getQ(state2, a) for a in self.actions])
    self.learnQ(state1, action1,
                reward, reward + self.gamma*maxqnew)

And here is the SARSA learn method

def learn(self, state1, action1, reward, state2, action2):
    qnext = self.getQ(state2, action2)
    self.learnQ(state1, action1,
                reward, reward + self.gamma * qnext)

As we can see, the SARSA method takes another parameter, action2, which is the action that was taken by the agent from the second state. This allows the agent to explicitly find the future reward value, qnext, that followed, rather than assuming that the optimal action will be taken and that the largest reward, maxqnew, resulted.

Written out, the Q-learning update policy is Q(s, a) = reward(s) + alpha * max(Q(s')), and the SARSA update policy is Q(s, a) = reward(s) + alpha * Q(s', a'). This is how SARSA is able to take into account the control policy of the agent during learning. It means that information needs to be stored longer before the action values can be updated, but also means that our mouse is going to jump off a cliff much less frequently, which we can probably all agree is a good thing.

Tagged , ,

Wrapping MapleSim C code for Python

Previously, I worked through a simple inverse kinematics example with at three link arm. The arm, however, was uninteresting, and in fact had no dynamics of its own. I’ve also previously (but not in a blog post) built up a number of arm models using MapleSim that were interesting, and ported them into C. From there I used them in Matlab / Simulink, but I’m on a Python kick now and tired of Matlab; so I’ve been looking at how to wrap this C code generated by MapleSim using Cython such that I can interact with it from Python.

So the code that simulates my models is exported to C, which is great in that it’s super fast, but it’s bad because it’s a pain in the ass to work with. I don’t want to have to write my controllers and learning algorithms all in C, it would be much nicer to do it all in Python. So I was looking at interfacing Python with the C library, and did a little (as seen in my last post), but then I already know how to make Cython play nice with C++ so I figured let’s shoot for that instead. What follows now is a hacky guide to getting your MapleSim model simulations to work with Python. First we’ll get the C code to just run, then we’ll port it to Python, then we’ll get it going graphically.

Get the MapleSim code to run in C++ with classes and input

Step 1: Rename your file from ‘whatever.c’ to ‘whatever.cpp’, in the code here I’m renaming ‘c2LinkArm.c’ to ‘2LinkArm.cpp’. I know, I know, it’s painful but hopefully that’s the worst of it.

Step 2: Add #include "mplshlib.h" to the beginning of the file. The code is meant to be run in a MapleSim environment (or something), so it requires one of the Maple library files. We’ll just add this in here and everything should be happy.

Step 3: For whatever reason, there is a function in here that sets all of the system input to 0. Why, you ask? Good question. The function in question is static void inpfn(double T, double *U), and you can see that all it does is set every element of U = 0. Now, you can either comment out the body of this function, or, several lines down, the first line of the SolverUpdate function (after a long i variable declaration) is the call to this function, so you can comment that out and then everything’s fine. Hurray! We can give input to the system now!

And that’s all the inline code editing we have to do. We’re not done with this file though, we still need to append some things so that we can use it easily.

Step 4: The reason that I want to use C++ is because with C++ we can make a class, which we’ll call the Sim class, that can store all of our simulation details, and hold some wrapper functions to hide the full blown interface to the simulator functions. Let’s make that class now. Go down to the bottom of the file and paste in the following code:
c2LinkArm.cpp

/*  A class to contain all the information that needs to
    be passed around between these functions, and can
    encapsulate it and hide it from the Python interface.

    Written by Travis DeWolf (May, 2013)
*/
class Sim {
    /* Very simple class, just stores the variables we
    need for simulation, and has 2 functions. Reset
    resets the state of the simulation, and step steps it
    forward. Tautology ftw!*/

    double* params;
    double dt, t0;
	double u0[NINP], other_out[NOUT+1], y[NOUT];
    double w[1 + 2 * NEQ + NPAR + NDFA + NEVT];

    SolverStruct S;

    public:
        Sim(double dt_val, double* params_pointer);
        void reset(double* out, double* ic);
        void step(double* out, double* u);
};

Sim::Sim(double dt_val, double* params_pointer)
{
    t0 = 0.0; // set up start time
    dt = dt_val; // set time step
    for (int i = 0; i < NINP; i++)
        u0[i] = 0.0; // initial control signal

    params = params_pointer; // set up parameters reference

	/* Setup */
	S.w = w;
	S.err = 0;
}

void Sim::reset(double* out, double* ic)
{
	SolverSetup(t0, ic, u0, params, y, dt, &S);

	/* Output */
	out[0] = t0;
        for(int j = 0; j < NOUT; j++)
            out[j + 1] = y[j];
}

void Sim::step(double* out, double* u)
/* u: control signal */
{
    for (int k = 0; k < NOUT; k++)
        out[k] = *dsn_undef; // clear values to nan

	/* Integration loop */
    /* Take a step with states */
    EulerStep(u, &S);

    if (S.err <= 0)
    {
        /* Output */
        SolverOutputs(y, &S);

        out[0] = S.w[0];
        for(long j = 0; j < NOUT; j++)
            out[j + 1] = y[j];
    }
}

int main (void)
{
    FILE *fd;

    double *ic, *p, *out;
    char* errbuf;
    long i, j, outd;
    long internal = 0;

    double dt = 0.00001;

    int time_steps = 1000000;
    double u[NINP];
    for (int k = 0; k < NINP; k++) u[k] = .1;

    fd = fopen("output.dat", "w");

    Sim sim = Sim(dt, NULL);
    sim.reset(out, NULL); // ic = NULL, use default start state

    for(i=0;i<time_steps;i++)
    {
        sim.step(out, u);
        fprintf(fd,"%lf ",out[0]);
        for(j=0;j<NOUT;j++)
        {
            fprintf(fd,"%lf ",out[j+1]);
        }
        fprintf(fd, "\n");
    }

    fclose(fd);

    return 0;
}

So, this is based off of the ParamDriverC() function from the MapleSim generated code. I just separated out the initialization and the simulation into two functions (reset() and step(), respectively), so that it’s possible to explicitly control the simulation stepping through time, and change the input control signal at every time step.

NOTE: The dt here must be <= 1e-5, or the constraint solver in the code throws an error. Don’t fear though, this simulator is still crazy fast.

OK. This code then can be compiled (once you have the library files in the same folder) and run with:

g++ c2LinkArm.cpp -out sim
./sim

You may have noticed a bunch of warning messages scrolling by, I’m not about to touch those and they don’t affect the simulation, so as long as one saying ‘error’ doesn’t pop up let’s just leave well enough alone.

Once you’ve run the executable you should now have a file called ‘output.dat’ filled with simulation data. The first column is time, and the rest are the output variables from your MapleSim model. Alright, we’ve got our code up and running! Let’s go an and make a Cython wrapper for it.

Making a Cython wrapper for the MapleSim code
The first thing that we need to do is take the main function we just added in out of the simulation code above. We’re done with running the C code on its own so we don’t need it anymore. Get rid of it already.

Now, this is going to be very similar to the code from the previous Cython posts, but in this one there’s no getting around it: We need to pass back and forth arrays. It turns out this isn’t very complicated, but there’s a template you have to follow and it can be difficult if you don’t know what that is.

I’ll post the code and then we can go through the tricky parts. As always, we create a ‘pyx’ file for our Cython wrapper code, I called this one ‘py2LinkArm.pyx’.
py2LinkArm.pyx

import numpy as np
cimport numpy as np

cdef extern from "c2LinkArm.cpp": 
    cdef cppclass Sim:
        Sim(double dt, double* params)
        void reset(double* out, double* ic)
        void step(double* out, double* u)

cdef class pySim:
    cdef Sim* thisptr
    def __cinit__(self, double dt = .00001, 
                        np.ndarray[double, mode="c"] params=None):
        """
        param float dt: simulation timestep, must be < 1e-5
        param array params: MapleSim model internal parameters
        """
        if params: self.thisptr = new Sim(dt, &params[0])
        else: self.thisptr = new Sim(dt, NULL)

    def __dealloc__(self):
        del self.thisptr

    def reset(self, np.ndarray[double, mode="c"] out, 
                    np.ndarray[double, mode="c"] ic=None):
        """
        Reset the state of the simulation.
        param np.ndarray out: where to store the system output
            NOTE: output is of form [time, output]
        param np.ndarray ic: the initial conditions of the system
        """
        if ic: self.thisptr.reset(&out[0], &ic[0])
        else: self.thisptr.reset(&out[0], NULL)

    def step(self, np.ndarray[double, mode="c"] out, 
                   np.ndarray[double, mode="c"] u):
        """
        Step the simulation forward one timestep.
        param np.ndarray out: where to store the system output
            NOTE: output is of form [time, output]
        param np.ndarray u: the control signal
        """
        self.thisptr.step(&out[0], &u[0])

This is pretty familiar: We make a reference to a class defined in C++, and then we wrap it in a python class and define functions that we can actually call from Python.

The main difference here is that we have to handle passing in numpy arrays, and having them be in the appropriate double* form that our C++ code is expecting to see. There are three parts involved in this. First, note at the top that we have an import numpy as np followed by a cimport numpy as np, this gives us access to the functions that we’ll need. Second, for each of the arrays accepted as parameter we have a special type declaration: np.ndarray[double, mode="c"] var. And finally, we pass the arrays on with a dereferencing &var[0]. If we were passing along 2D arrays we would use &var[0][0].

In the __init__() and reset() methods, I’m allowing the model parameters, params, and initial conditions, ic, arrays to be undefined, in which case NULL is passed to our C code and the default values we defined inside MapleSim are used. The reason for this is that these require some pretty intimate knowledge of the MapleSim model, and if a casual user wants to just use this business they shouldn’t have to know anything about the internal states.

With our wrapper code done now the only thing left is our setup file! This one is just straight from the previous posts, with the exception that only the ‘pyx’ file is included in the sources declaration because our ‘py2LinkArm.pyx’ file references the ‘c2LinkArm.cpp’ file directly, instead of referencing a header file.
setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
 
setup(
  cmdclass = {'build_ext': build_ext},
  ext_modules=[Extension("py2LinkArm", 
               sources=["py2LinkArm.pyx"],
               language="c++"),],
)

With our setup file we should now be good to go. Let’s compile! To do this I pull open IPython and punch in

run setup.py build_ext -i

And it will complain and throw some warnings, but it should compile just fine. And now we have access to the pySim class!
Note: that should read ‘build_ext’ with an underscore in there, I’m not sure why it’s not displaying but that underscore needs to be there or it won’t run.
To see it do some things, type in

import numpy as np
import py2LinkArm

sim = py2LinkArm.pySim()

u = np.ones(2) # input array
out = np.zeros(3) # output array

sim.reset(out)
print out # initial values

for i in range(10):
    sim.step(out, u)
    print out

And we can see it moving forward! If you set u = np.ones(2) * .1, which is what it was when we generated the ‘output.dat’ file using the C code only and run it forward you’ll see that the we get the same results for both. Excellent. OK now let’s display it!

Running and plotting the MapleSim arm model
In a previous post I used Pyglet to plot out a 3 link arm (a very poor, lines only plot, but sufficient), and I’m going to use it again here (for a very poor, lines only plot). The file is called ‘run.py’, and is less than 80 lines. I’ll post it and then walk through it below:
run.py

import numpy as np
import pyglet
import time

import py2LinkArm

def run(): 
    """Runs and plots a 2 link arm simulation generated by MapleSim."""

    # set up input control signal to simulation
    u = np.ones(2) * .1 
    # set up array to receive output from simulation
    out = np.zeros(3) 

    # create MapleSim sim class 
    sim = py2LinkArm.pySim() 
    # set up initial conditions of the system
    sim.reset(out) 

    ###### Plotting things

    # segment lengths = (.31, .27)m (defined in MapleSim)
    L = np.array([31, 27]) * 3

    # make our window for drawin'
    window = pyglet.window.Window()

    # set up some labels to display time 
    label_time = pyglet.text.Label('time', font_name='Times New Roman', 
        font_size=36, x=window.width//2, y=window.height - 50,
        anchor_x='center', anchor_y='center')
    # and joint angles
    label_values = pyglet.text.Label('values', font_name='Times New Roman', 
        font_size=36, x=window.width//2, y=window.height - 100,
        anchor_x='center', anchor_y='center')

    def update_arm(dt):
        """ Simulate the arm ahead a chunk of time, then find the 
        (x,y) coordinates of each joint, and update labels."""
   
        # simulate arm forward 15ms
        for i in range(1500):
            sim.step(out, u)

        # find new joint (x,y) coordinates, offset to center of screen-ish
        x = np.array([ 0, 
            L[0]*np.cos(out[1]),
            L[0]*np.cos(out[1]) + L[1]*np.cos(out[1]+out[2])]) + window.width/2
        y = np.array([ 0, 
            L[0]*np.sin(out[1]),
            L[0]*np.sin(out[1]) + L[1]*np.sin(out[1]+out[2])]) + 100

        # update line endpoint positions for drawing
        window.jps = np.array([x, y]).astype('int')
        label_time.text = 'time: %.3f'%out[0]
        label_values.text = '(01, 02) = (%.2f, %.2f)'%(out[1], out[2])
 
    update_arm(0) # call once to set up

    @window.event
    def on_draw():
        window.clear()
        label_time.draw()
        label_values.draw()
        for i in range(2): 
            pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i', 
                (window.jps[0][i], window.jps[1][i], 
                 window.jps[0][i+1], window.jps[1][i+1])))

    # set up update_arm function to be called super often
    pyglet.clock.schedule_interval(update_arm, 0.00001)
    pyglet.app.run()

run()

Alright. In the first part, we’re just doing the same as above when we imported our brand new module and set it up for simulation.
What’s different is that we’ve now pushed the actual simulation into a method called update_arm. The idea is that the drawing is significantly slower than the simulation, so we’ll simulate a bunch, then update the plots once, then repeat, and hopefully get something that simulates at a decent speed. Once we have the joint angles of the arm stored safely in our out variable, we calculate the (x,y) locations of the elbow and wrist and update the system. Then the last thing this method does is update the text of the labels in the window.

In the on_draw method we’ve overwritten, we clear the window, then draw our labels and the lines corresponding to upper and lower arm segments.

Then the last thing is to schedule our update_arm method to be called periodically, in this case in extremely often.

When you run this file, you should see something that looks like:
pic
You’ll have to trust me it looks better when it’s moving, and when it’s moving it will look better when the input isn’t constant.

But there you go! Now you are able to take your very own MapleSim model’s optimized C code, append some C++ code to make a Sim class, wrap it in Cython, and access it as a Python module! Why would anyone ever want to go through this trouble? Well, finding good simulations with appropriate dynamics can be a huge pain in the ass, and sometimes you just want something stand-alone. Also, now that this is done, this exact same code should be good for any MapleSim model, so hopefully the process is relatively straightforward. At least that’s my dream.

The code that I’ve used in this post can be found at my github: 2LinkArm.

Addendum

This was all written for compiling on Ubuntu, my labmate and pal Xuan Choo also went through getting this all to work on Windows too, here are his notes!

Step 1: Get and install cython

Step 2: Follow the instructions here. Essentially:

  • 2a: Download and install the windows SDK for your python version
  • 2b: Start the windows SDK command shell (start menu->programs->windows sdk->CMD shell)
  • 2c: Type these things into the cmd shell:
    set DISTUTILS_USE_SDK=1
    setenv /x64 /release
    
  • 2d: in the same cmd shell, navigate to the folder with the pyx file you want to compile, and run:
    python setup.py build_ext -i
    
  • 2d*: If it complains about undefined numpy references, add “include_dirs=[numpy.get_include()]” to the Extension class call (also import numpy in setup.py).
    from distutils.core import setup
    from distutils.extension import Extension
    from Cython.Distutils import build_ext
    import numpy
    
    setup(
      cmdclass = {'build_ext': build_ext},
      ext_modules=[Extension("py3LinkArm",
                   sources=["py3LinkArm.pyx"],
                   language="c++",
                   include_dirs=[numpy.get_include()]),],
    )
    

Done!

Tagged , , , , ,

Cython Journey Part 3: Wrapping C code and passing arrays

Basic C interfacing example

OK, so this code is taken straight from an example here, I’m just going to explicitly work through all the steps so that when I try to do this again later I don’t have to look everything up again.

Let’s say we have some c code that we want to work with from Python, in a file called ‘square.c’:

void square(double* array, int n) {
    int i;
    for (i=0; i<n; ++i) {
        array[i] = array[i] * array[i];
    }
}

The first thing we’re going to need to do is compile this to a shared object file, so we punch the following into our terminal:

gcc -c square.c # compile the code to an object file
gcc square.o -shared -o square.so # compile to shared object file

And so now we have our square.so file. Super.

Here is the code that we can then use to interact with this shared object file from Python:

import numpy as np
import ctypes

square = ctypes.cdll.LoadLibrary("./square.so")

n = 5
a = np.arange(n, dtype=np.double)

aptr = a.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
square.square(aptr, n)

print a

OK, so there are a couple of lines with a lot of jargon but aside from that this looks pretty straightforward. The first noticeable thing is that we’re importing the ctypes library. So, make sure that you have that installed before you try to run this code. Right after importing ctypes then we create a handle into our shared object file, using the ctypes.cdll.LoadLibrary() command.

Then it’s some standard python code to create an array, and then there’s this funky command that lets us get a handle to the array that we just created as a c pointer that we can pass in to our square() function. You can see a list of all the different kinds of data types here.

Once we have that handle to our array, we simply pass it through to the square.square() function, and let it go and do whatever it’s going to do with it. Then we print out our array a, and to our great joy, all of the values have been squared.

So that’s that. Instead of returning anything like double* from the C library I would recommend just passing in the array as a parameter and modifying it instead. This is all nice and straightforward enough, and it seems like the returning business can lead to a lot of memory allocation and ownership headaches.

The code for this example can be found at my github, here: Square.

Tagged , , , ,

Inverse kinematics of 3-link arm with constrained minimization in Python

Inverse kinematics is a common topic in robotics control; one that most anyone working with a robotic arm needs to address at some point. The use case I’ll be looking at here is the situation where we want to specify an (x,y) coordinate for the end-effector, and solve for the appropriate joint angle configuration that best achieves this. It’s surprisingly straightforward to handle with the right approach, so it’s a nice little exercise in Python optimization (and we’ll even work in some graphical display!) And per usual, the code is all up on my github.

Most of the time the servos that anyone is working with are position controlled, meaning that you provide them with a set of desired angles for them to move to, rather than force controlled, where you provide them with a torque signal that directly moves them. In this case, inverse kinematics is all you need. You simply work out the set of joint angles you want the servos to move to and send those out directly. In biological systems this clearly isn’t the end of the work, because even if you know where you want to go you still have to provide a force (muscle activation) signal that will move your system there. But! as mentioned in the motor babbling paper review post I wrote a while back, having the target specified in the most basic space (i.e. joint angles instead of (x,y) coordinates for a servo based arm, or muscle lengths instead of joint angles for a muscle based arm) can make calculating, or learning, the appropriate force signal much easier; so something that can do the inverse kinematics calculation is worth having.

Inverse kinematics

Now then, how do we go about finding the joint angle configuration that best achieves a desired (x,y) coordinate? Constrained minimization! Our constraint is that the tip of the arm must be at the specified (x,y) position, and we need to specify some cost function for the system to minimize to choose among all the possible configurations that accomplish this. The cost function that we’re going to use here is a very intuitive one, that I first encountered in the paper ‘Learning operational space control’ by Jan Peters. What we’re going to do is specify a default, or resting, joint state configuration, and minimize our distance from that. Which makes a lot of sense.

I’m going to post the code for calculating the inverse kinematics of a 3-link arm, and then we’ll work through it.

import math
import numpy as np
import scipy.optimize  

class Arm3Link:
    
    def __init__(self, q=None, q0=None, L=None):
        """Set up the basic parameters of the arm.
        All lists are in order [shoulder, elbow, wrist].
        
        :param list q: the initial joint angles of the arm
        :param list q0: the default (resting state) joint configuration
        :param list L: the arm segment lengths
        """
        # initial joint angles
        if q is None: q = [.3, .3, 0]
        self.q = q
        # some default arm positions
        if q0 is None: q0 = np.array([math.pi/4, math.pi/4, math.pi/4]) 
        self.q0 = q0
        # arm segment lengths
        if L is None: L = np.array([1, 1, 1]) 
        self.L = L
        
        self.max_angles = [math.pi, math.pi, math.pi/4]
        self.min_angles = [0, 0, -math.pi/4]

    def get_xy(self, q=None):
        if q is None: q = self.q

        x = self.L[0]*np.cos(q[0]) + \
            self.L[1]*np.cos(q[0]+q[1]) + \
            self.L[2]*np.cos(np.sum(q))

        y = self.L[0]*np.sin(q[0]) + \
            self.L[1]*np.sin(q[0]+q[1]) + \
            self.L[2]*np.sin(np.sum(q))

        return [x, y]

    def inv_kin(self, xy):

        def distance_to_default(q, *args): 
            # weights found with trial and error, get some wrist bend, but not much
            weight = [1, 1, 1.3] 
            return np.sqrt(np.sum([(qi - q0i)**2 * wi
                for qi,q0i,wi in zip(q, self.q0, weight)]))

        def x_constraint(q, xy):
            x = ( self.L[0]*np.cos(q[0]) + self.L[1]*np.cos(q[0]+q[1]) + 
                self.L[2]*np.cos(np.sum(q)) ) - xy[0]
            return x

        def y_constraint(q, xy): 
            y = ( self.L[0]*np.sin(q[0]) + self.L[1]*np.sin(q[0]+q[1]) + 
                self.L[2]*np.sin(np.sum(q)) ) - xy[1]
            return y
        
        return scipy.optimize.fmin_slsqp( func=distance_to_default, 
            x0=self.q, eqcons=[x_constraint, y_constraint], 
            args=[xy], iprint=0) # iprint=0 suppresses output


I’ve taken out most of the comments for compactness, but there are plenty in the code on github, don’t you worry. Alright, let’s take a look!

First, we go through and set up the parameters of our arm. If nothing was passed in during construction, then we’re going to make an arm that is initially at it’s resting position, with shoulder and elbow joint angles = \frac{\pi}{4}, and wrist angle = 0. We also set up the arm segment lengths to all be 1, and the minimum and maximum joint angles.

Next, there’s the get_xy function, which just uses some simple trig to calculate the current (x,y) position of the end-effector. This is going to be a handy thing to have when we want to see how well the joint configurations we calculate achieve our desired hand position.

And then after this, we have the actual inverse kinematics function. inv_kin takes in a desired (x,y) position, and returns to us a list of the joint angles that will best achieve this. We define three methods inside inv_kin that define our constrained optimization problem, distance_to_default, which is the function we are minimizing, and x_constraint and y_constraint, which are the constraints that must be met for a solution to be valid. These are all pretty straight forward functions, the distance_to_default function calculates the Euclidean distance in joint space to the resting state, and the constraint functions calculate the difference between the actual and desired end-effector positions for a given set of joint angles.

There is one point of interest in the distance_to_default method, and that is the use of the weights therein. What is the point of applying a gain to these values, you ask? Why, simply that it lets us weight the relative importance of each of these joints remaining close to their default configuration! If you think about moving your arm on a horizontal plane, the wrist actually bends very little. To mimic this we can set the weight applied to the distance of the wrist from its resting state higher than those of the other joints, so that it becomes a higher priority to keep the wrist near its resting state. I suggest playing around with once the graphical display (below) is built up, it’s fun to see the effects different weightings can give on joint configurations.

With these three functions defined, we can now call up the scipy.optimize.fmin_slsqp function, which performs the sequential least squares method to arrive at a solution.

Really, that’s all there is to it!

To test this now, we have the following method:

def test():
    ############Test it!##################

    arm = Arm3Link()

    # set of desired (x,y) hand positions
    x = np.arange(-.75, .75, .05)
    y = np.arange(0, .75, .05)

    # threshold for printing out information, to find trouble spots
    thresh = .025

    count = 0
    total_error = 0
    # test it across the range of specified x and y values
    for xi in range(len(x)):
        for yi in range(len(y)):
            # test the inv_kin function on a range of different targets
            xy = [x[xi], y[yi]]
            # run the inv_kin function, get the optimal joint angles
            q = arm.inv_kin(xy=xy)
            # find the (x,y) position of the hand given these angles
            actual_xy = arm.get_xy(q)
            # calculate the root squared error
            error = np.sqrt((np.array(xy) - np.array(actual_xy))**2)
            # total the error 
            total_error += error
            
            # if the error was high, print out more information
            if np.sum(error) > thresh:
                print '-------------------------'
                print 'Initial joint angles', arm.q 
                print 'Final joint angles: ', q
                print 'Desired hand position: ', xy
                print 'Actual hand position: ', actual_xy
                print 'Error: ', error
                print '-------------------------'
            
            count += 1

    print '\n---------Results---------'
    print 'Total number of trials: ', count
    print 'Total error: ', total_error
    print '-------------------------'

Which is simply working through a set of target (x,y) points, and calculating the total error across all of them. If any particular point gives an error above the threshold level, print out the information for that point so we can take a closer look if so desired. Fortunately, the threshold error isn’t even approached in this test, and for output we get:

---------Results---------
Total number of trials:  450
Total error:  [  3.33831421e-05   2.89667496e-05]
-------------------------

Not bad!

Visualization

So that’s all great and dandy, but it’s always nice to be able to really visualize these things, so I wrote another method that uses Pyglet to help visualize. This is a really easy to use graphics program that I tested out as a means here, and I was very happy with it. I ended up writing a method that pops up a window with a 3-link arm drawn on it, and the arm uses the above inverse kinematics function in the Arm class written above to calculate the appropriate joint angles for the arm to be at to follow the mouse. Once the joint angles are calculated, then their (x,y) locations are also calculated, and the arm is drawn. Pleasingly, the inv_kin above is fast enough to work in real time, so it’s a kind of fun toy example. Again, I’ll show the code and then we’ll work through it below.

import numpy as np
import pyglet
import time

import Arm

def plot(): 
    """A function for plotting an arm, and having it calculate the 
    inverse kinematics such that given the mouse (x, y) position it 
    finds the appropriate joint angles to reach that point."""

    # create an instance of the arm
    arm = Arm.Arm3Link(L = np.array([300,200,100]))

    # make our window for drawin'
    window = pyglet.window.Window()

    label = pyglet.text.Label('Mouse (x,y)', font_name='Times New Roman', 
        font_size=36, x=window.width//2, y=window.height//2,
        anchor_x='center', anchor_y='center')

    def get_joint_positions():
        """This method finds the (x,y) coordinates of each joint"""

        x = np.array([ 0, 
            arm.L[0]*np.cos(arm.q[0]),
            arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]),
            arm.L[0]*np.cos(arm.q[0]) + arm.L[1]*np.cos(arm.q[0]+arm.q[1]) + 
                arm.L[2]*np.cos(np.sum(arm.q)) ]) + window.width/2

        y = np.array([ 0, 
            arm.L[0]*np.sin(arm.q[0]),
            arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]),
            arm.L[0]*np.sin(arm.q[0]) + arm.L[1]*np.sin(arm.q[0]+arm.q[1]) + 
                arm.L[2]*np.sin(np.sum(arm.q)) ])

        return np.array([x, y]).astype('int')
    
    window.jps = get_joint_positions()

    @window.event
    def on_draw():
        window.clear()
        label.draw()
        for i in range(3): 
            pyglet.graphics.draw(2, pyglet.gl.GL_LINES, ('v2i', 
                (window.jps[0][i], window.jps[1][i], 
                 window.jps[0][i+1], window.jps[1][i+1])))

    @window.event
    def on_mouse_motion(x, y, dx, dy):
        # call the inverse kinematics function of the arm
        # to find the joint angles optimal for pointing at 
        # this position of the mouse 
        label.text = '(x,y) = (%.3f, %.3f)'%(x,y)
        arm.q = arm.inv_kin([x - window.width/2, y]) # get new arm angles
        window.jps = get_joint_positions() # get new joint (x,y) positions

    pyglet.app.run()

plot()

There are really only a few things that we’re doing in this method: We create an arm with somewhat normal segment lengths and a window for plotting; we have a function for calculating the (x,y) coordinates of each of the joints for plotting; we have a function that tracks the position of the mouse and updates our arm joint angles by calling the arm.inv_kin function we wrote above; and lastly we plot it! Easy. The functions aren’t quite in that order, and that’s because I chose alphabetical ordering over conceptual. Here’s a picture! armplot

I think the only kind of weird thing going on in here is that my variables that I use in both this method and in the overwritten window methods are defined as belonging to window. Aside from that, it’s straightforward trig to calculate the joint angle Cartesian coordinates, and then there’s also a label that displays the mouse (x,y) position for the hell of it. Honestly, the label was in a Pyglet tutorial example and I just left it in.

It’s not going to win any awards for graphics by any means, but it’s simple and it works! And, assuming you have Pyglet installed (which is very easy to do), you should straight up be able to run it and be amazed. Or a little impressed. Or not, whatever. But in any case now you have your very own inverse kinematics example on a 3-link arm! Moving to any other number of joints is exactly the same, you just have to add the corresponding number of default joint configurations and everything should work out as is.

From here, the plan is to look at bringing in an actual force driven arm model to the mix, and then building up a system that can track the mouse, but with a control signal that specifies forces to apply to the arm, rather than positions. Which I think will be pretty neat. And in case you missed it at the top, here’s a link to all the code on my github.

Tagged , , ,

Nengo model – Low pass derivative filter

To just get the code you can copy / paste from below or get the code from my github: low_pass_derivative_filter.py.

In the course of building models in Nengo, I recently came in to need for a neural implementation of a low pass derivative filter. I scripted up a sub-network in Nengo (www.nengo.ca) that does this, and until we get the model database / repository up and running for Nengo scripts I’ll keep working through building these things here, because it goes over some basic methods that can be useful when you’re building up your models.

Basically there are three parts to this model: Derivative calculation, absolute value calculation, and an inhibitory projection with a threshold activation that projects to the output population. Here’s a picture:
Here’s the idea: The population input projects both directly to the output population and to a population that calculates the derivative of the sum across dimensions of the input signal. The derivative population passes on the derivative of the input signal to an absolute value calculating population, which passes the absolute value of the derivative on to a population that isn’t activated for values under a threshold level. This threshold population then projects very strong inhibition to the output population, so that when the absolute value of the derivative is above the threshold level, no output is projected, and otherwise the system just relays the input signal straight through. Here’s the code:

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
        
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

    return dlowpass.network

First off, this code is taking advantage of the absolute value function that was written a couple blog posts ago. You can either go check out that post or I’ll also have that function included in the code at the end for completeness. Aside from that, there are a lot of things going on that you won’t come across if you’re just coding up simple examples, so let’s look at them.

At the top, we’re assigning our output network a handle, which I try to avoid in general for neatness, since most of the times you can reference it by simply referring to it’s name, 'output'. The reason that I assign it a handle here is because we’re going to be calling upon some Java API features that (to my knowledge) aren’t handled in the Python API yet, and although we could call up the output node as a Java object with dlowpass.get('output') using only its assigned name, it will just be cleaner in the end to have a handle for it. We’ll come back to this.

The next interesting thing that happens, is that when we’re creating our derivative population, we set the number of neurons to 10*neurons, and radius=10. This is to because in the derivative we’re representing the sum of all of the input dimensions. How are all the input dimensions being summed up in derivative, you ask? Just below, on line 17. When we connect up the input relay to derivative we set index_post=0, which means that all of the input dimensions are going to to project to the same dimension of the derivative population. The default weight for this connection is 1, so then dimension 0 of derivative is equal to 1 * value for value in input_dimensions . Super.

But why do we set radius=10? This is because the radius parameter specifies the range of values represented by this population. The default is (-1,1), but when we specify radius, the new range of represented values becomes (-radius, radius). We’re making a bit of an assumption that this value won’t go outside of the range (-10,10) here, but that should be OK for most of the situations we’re going to come across. In the specific model I’m using this for it’s definitely the case, so that’s why I’ve set the default value to 10. And because we don’t want the accuracy in representation to decrease, we also scale up the number of neurons in this population by radius*neurons.

And there’s still more happening in this derivative population! On line 11 I specify a recurrent connection that projects into a second dimension represented in derivative. So now what’s going to happen is that the sum of the input signals is projected into the first dimension of derivative, and through a recurrent connection the value of the sum of the input signals from time t-pstc will be represented in the second dimension of the derivative population.

To calculate the derivative then, it’s a simple matter of subtracting the previous signal from the current signal, which is what happens in the function sub that I define on line 19. To implement this function, when connecting up derivative to abs_val, just set the parameter func=sub. Easy.

Now, when the abs_val population is made, we set and intercept=(.2,1). This is the same trick we used in the previous absolute value function model, but it’s acting as a threshold here. Basically, this population won’t respond if the value being projected into it is between (-.2, .2).

So, up to this point, what we have is a summation of the input dimensions, the derivative being calculated and passed to an absolute value function, and this population only responds if the derivative is greater than .2.

The last part is hooking up this abs_val population to the output relay, to suppress output whenever it’s activated (i.e. when the derivative is greater than .2). This is where we need to pull into the Java API, and this is why we specified a handle for our output population. In lines 25-26, what’s going on is that instead of using the NEF neural compiler functionality to set up our connection weights to compute some function, we’re specifying them ourselves. And we’re specifying them to prevent the neurons in output from firing. Now, the activation of the neurons in abs_val reduces the voltage values being sent into the output population, inhibiting their activity.

And that’s it! Here’s the complete code to run an example of this network (which can also be found on my github low_pass_derivative_filter.py):

import nef

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
    
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)

    return abs_val.network

def make_dlowpass(name, neurons, dimensions, radius=10, tau_inhib=0.005, inhib_scale=10):

    dlowpass = nef.Network(name)

    dlowpass.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    output = dlowpass.make('output', neurons=dimensions*neurons, dimensions=dimensions) # create output relay
    
    # now we track the derivative of sum, and only let output relay the input
    # if the derivative is below a given threshold
    dlowpass.make('derivative', neurons=radius*neurons, dimensions=2, radius=radius) # create population to calculate the derivative
    dlowpass.connect('derivative', 'derivative', index_pre=0, index_post=1, pstc=0.1) # set up recurrent connection
    dlowpass.add(make_abs_val(name='abs_val', neurons=neurons, dimensions=1, intercept=(.2,1))) # create a subnetwork to calculate the absolute value  

    # connect it up!
    dlowpass.connect('input', 'output') # set up communication channel
    dlowpass.connect('input', 'derivative', index_post=0)
    
    def sub(x):
        return [x[0] - x[1]]
    dlowpass.connect('derivative', 'abs_val.input', func=sub)
        
    # set up inhibitory matrix
    inhib_matrix = [[-inhib_scale]] * neurons * dimensions
    output.addTermination('inhibition', inhib_matrix, tau_inhib, False)
    dlowpass.connect('abs_val.output', output.getTermination('inhibition'))

    return dlowpass.network

# Create network
net = nef.Network('net')

# Create / add low pass derivative filter
net.add(make_dlowpass(name='dlowpass', neurons=N, dimensions=D))

# Make function input
net.make_input('input_function', values=[0]*D)

# Connect up function input to filter
net.connect('input_function', 'dlowpass.input')

# Add it all to Nengo
net.add_to_nengo()

And here’s a picture of it running. What you can see is that anytime the input changes quickly, the system input drops to zero, but when the input is holding constant or is changing slowly the output is allowed to pass through. Great! Just what we wanted. net

Tagged , ,

Reinforcement learning part 1: Q-learning and exploration

We’ve been running a reading group on Reinforcement Learning (RL) in my lab the last couple of months, and recently we’ve been looking at a very entertaining simulation for testing RL strategies, ye’ old cat vs mouse paradigm. There are a number of different RL methods you can use / play with in that tutorial, but for this I’m only going to talk about Q-learning. The code implementation I’ll be using is all in Python, and the original code comes from one of our resident post-docs, Terry Stewart, and can be garnered from his online RL tutorial. The code I modify here is based off of Terry’s code and modified by Eric Hunsberger, another PhD student in my lab. I’ll talk more about how it’s been modified in another post.

Q-learning review
For those unfamiliar, the basic gist of Q-learning is that you have a representation of the environmental states s, and possible actions in those states a, and you learn the value of each of those actions in each of those states. Intuitively, this value, Q, is referred to as the state-action value. So in Q-learning you start by setting all your state-action values to 0 (this isn’t always the case, but in this simple implementation it will be), and you go around and explore the state-action space. After you try an action in a state, you evaluate the state that it has lead to. If it has lead to an undesirable outcome you reduce the Q value (or weight) of that action from that state so that other actions will have a greater value and be chosen instead the next time you’re in that state. Similarly, if you’re rewarded for taking a particular action, the weight of that action for that state is increased, so you’re more likely to choose it again the next time you’re in that state. Importantly, when you update Q, you’re updating it for the previous state-action combination. You can only update Q after you’ve seen what results.

Let’s look at an example in the cat vs mouse case, where you are the mouse. You were in the state where the cat was in front of you, and you chose to go forward into the cat. This caused you to get eaten, so now reduce the weight of that action for that state, so that the next time the cat is in front of you you won’t choose to go forward you might choose to go to the side or away from the cat instead (you are a mouse with respawning powers). Note that this doesn’t reduce the value of moving forward when there is no cat in front of you, the value for ‘move forward’ is only reduced in the situation that there’s a cat in front of you. In the opposite case, if there’s cheese in front of you and you choose ‘move forward’ you get rewarded. So now the next time you’re in the situation (state) that there’s cheese in front of you, the action ‘move forward’ is more likely to be chosen, because last time you chose it you were rewarded.

Now this system, as is, gives you no foresight further than one time step. Not incredibly useful and clearly too limited to be a real strategy employed in biological systems. Something that we can do to make this more useful is include a look-ahead value. The look-ahead works as follows. When we’re updating a given Q value for the state-action combination we just experienced, we do a search over all the Q values for the state the we ended up in. We find the maximum state-action value in this state, and incorporate that into our update of the Q value representing the state-action combination we just experienced.

For example, we’re a mouse again. Our state is that cheese is one step ahead, and we haven’t learned anything yet (blank value in the action box represents 0). So we randomly choose an action and we happen to choose ‘move forward’.
Now, in this state (we are on top of cheese) we are rewarded, and so we go back and update the Q value for the state ‘cheese is one step ahead’ and the action ‘move forward’ and increase the value of ‘move forward’ for that state.
Now let’s say the cheese is moved and we’re moving around again, now we’re in the state ‘cheese is two steps ahead’, and we make a move and end up in the state ‘cheese is one step ahead’.
Now when we are updating the Q value for the previous state-action combination we look at all the Q values for the state ‘cheese is one step ahead’. We see that one of these values is high (this being for the action ‘move forward’) and this value is incorporated in the update of the previous state-action combination.
Specifically we’re updating the previous state-action value using the equation: Q(s, a) += alpha * (reward(s,a) + max(Q(s') - Q(s,a)) where s is the previous state, a is the previous action, s' is the current state, and alpha is the discount factor (set to .5 here).

Intuitively, the change in the Q-value for performing action a in state s is the difference between the actual reward (reward(s,a) + max(Q(s'))) and the expected reward (Q(s,a)) multiplied by a learning rate, alpha. You can think of this as a kind of PD control, driving your system to the target, which is in this case the correct Q-value.

Here, we evaluate the reward of moving ahead when the cheese is two steps ahead as the reward for moving into that state (0), plus the reward of the best action from that state (moving into the cheese +50), minus the expected value of that state (0), multiplied by our learning rate (.5) = +25.

Exploration
In the most straightforward implementation of Q-learning, state-action values are stored in a look-up table. So we have a giant table, which is size N x M, where N is the number of different possible states, and M is the number of different possible actions. So then at decision time we simply go to that table, look up the corresponding action values for that state, and choose the maximum, in equation form:

def choose_action(self, state):
    q = [self.getQ(state, a) for a in self.actions]
    maxQ = max(q)
    action = self.actions[maxQ]
    return action

Almost. There are a couple of additional things we need to add. First, we need to cover the case where there are several actions that all have the same value. So to do that, if there are several with the same value, randomly choose one of them.

def choose_action(self, state):
    q = [self.getQ(state, a) for a in self.actions]
    maxQ = max(q)
    count = q.count(maxQ)
    if count > 1:
        best = [i for i in range(len(self.actions)) if q[i] == maxQ]
        i = random.choice(best)
    else:
        i = q.index(maxQ)

    action = self.actions[i]
    return action

This helps us out of that situation, but now if we ever happen upon a decent option, we’ll always choose that one in the future, even if there is a way better option available. To overcome this, we’re going to need to introduce exploration. The standard way to get exploration is to introduce an additional term, epsilon. We then randomly generate a value, and if that value is less than epsilon, a random action is chosen, instead of following our normal tactic of choosing the max Q value.

def choose_action(self, state):
    if random.random() < self.epsilon: # exploration action = random.choice(self.actions) else: q = [self.getQ(state, a) for a in self.actions] maxQ = max(q) count = q.count(maxQ) if count > 1:
            best = [i for i in range(len(self.actions)) if q[i] == maxQ]
            i = random.choice(best)
        else:
            i = q.index(maxQ)

       action = self.actions[i]
    return action

The problem now is that we even after we’ve explored all the options and we know for sure the best option, we still sometimes choose a random action; the exploration doesn’t turn off. There are a number of ways to overcome this, most involving manually adjusting epsilon as the mouse learns, so that it explores less and less as time passes and it has presumably learned the best actions for the majority of situations. I’m not a big fan of this, so instead I’ve implemented exploration the following way: If the randomly generated value is less than epsilon, then randomly add values to the Q values for this state, scaled by the maximum Q value of this state. In this way, exploration is added, but you’re still using your learned Q values as a basis for choosing your action, rather than just randomly choosing an action completely at random.

    def choose_action(self, state):
        q = [self.getQ(state, a) for a in self.actions]
        maxQ = max(q)

        if random.random()  1:
            best = [i for i in range(len(self.actions)) if q[i] == maxQ]
            i = random.choice(best)
        else:
            i = q.index(maxQ)

        action = self.actions[i]

        return action

Very pleasingly, you get results comparable to the case where you perform lots of learning and then set epsilon = 0 to turn off random movements. Let’s look at them!

Simulation results
So here, the simulation is set up such that there is a mouse, cheese, and a cat all inside a room. The mouse then learns over a number of trials to avoid the cat and get the cheese. Printing out the results after every 1,000 time steps, for the standard learning case where you reduce epsilon as you learn the results are:

10000, e: 0.09, W: 44, L: 778
20000, e: 0.08, W: 39, L: 617
30000, e: 0.07, W: 47, L: 437
40000, e: 0.06, W: 33, L: 376
50000, e: 0.05, W: 35, L: 316
60000, e: 0.04, W: 36, L: 285
70000, e: 0.03, W: 33, L: 255
80000, e: 0.02, W: 31, L: 179
90000, e: 0.01, W: 35, L: 152
100000, e: 0.00, W: 44, L: 130
110000, e: 0.00, W: 28, L: 90
120000, e: 0.00, W: 17, L: 65
130000, e: 0.00, W: 40, L: 117
140000, e: 0.00, W: 56, L: 86
150000, e: 0.00, W: 37, L: 77

For comparison now, here are the results when epsilon is not reduced in the standard exploration case:

10000, e: 0.10, W: 53, L: 836
20000, e: 0.10, W: 69, L: 623
30000, e: 0.10, W: 33, L: 452
40000, e: 0.10, W: 32, L: 408
50000, e: 0.10, W: 57, L: 397
60000, e: 0.10, W: 41, L: 326
70000, e: 0.10, W: 40, L: 317
80000, e: 0.10, W: 47, L: 341
90000, e: 0.10, W: 47, L: 309
100000, e: 0.10, W: 54, L: 251
110000, e: 0.10, W: 35, L: 278
120000, e: 0.10, W: 61, L: 278
130000, e: 0.10, W: 45, L: 295
140000, e: 0.10, W: 67, L: 283
150000, e: 0.10, W: 50, L: 304

As you can see, the performance now converges around 300, instead of 100. Not great.
Now here are the results from the alternative implementation of exploration, where epsilon is held constant:

10000, e: 0.10, W: 65, L: 805
20000, e: 0.10, W: 36, L: 516
30000, e: 0.10, W: 50, L: 417
40000, e: 0.10, W: 38, L: 310
50000, e: 0.10, W: 39, L: 247
60000, e: 0.10, W: 31, L: 225
70000, e: 0.10, W: 23, L: 181
80000, e: 0.10, W: 34, L: 159
90000, e: 0.10, W: 36, L: 137
100000, e: 0.10, W: 35, L: 138
110000, e: 0.10, W: 42, L: 136
120000, e: 0.10, W: 24, L: 99
130000, e: 0.10, W: 52, L: 106
140000, e: 0.10, W: 22, L: 88
150000, e: 0.10, W: 29, L: 101

And again we get that nice convergence to 100, but this time without having to manually modify epsilon. Which is pretty baller.

Code
And that’s it! There is of course lots and lots of other facets of exploration to discuss, but this is just a brief discussion. If you’d like the code for the cat vs mouse and the alternative exploration you can access them at my github: Cat vs mouse – exploration.
To alternate between the different types of exploration, change the import statement at the top of the egoMouseLook.py to be either import qlearn for the standard exploration method, or import qlearn_mod_random as qlearn to test the alternative method. To have epsilon reduce in value as time goes, you can uncomment the lines 142-145.

Tagged , ,

Nengo scripting: absolute value

To just get the code you can copy / paste from below or get the code from my github: absolute_value.py

This is a simple script for performing the absolute value function in Nengo. The most efficient way I’ve found to implement this is to use two separate populations for each dimension of the input signal, one to represent the signal when it’s greater than zero and simply relay it to the output node, and one to represent the signal when it’s less than zero, and project x * -1 to the output node. Here’s the code, and I’ll step through it below.

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
    
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)

    return abs_val.network

First off, the function takes in parameters specifying the number of dimensions, the number of neurons for each population generated, a name, and optionally an intercept value. I’ll come back to why the intercept value is an option in a bit.

Inside the make_abs_val function another function that multiplies the first dimension of its input by -1 is specified. This mult_neg_one function is going to be used by our population representing negative values of the input signal.

Next, we create the network and call it abs_val. Input and output relay nodes are then created, with one neuron, of the specified dimension number, and the populations are set to direct mode. These are the populations that will be connected to from populations outside of the abs_val network.

Now there is a loop for each dimension of the input signal. Inside, two populations are created, where the only difference is their encoder values. Their intercepts specify the start of the range of values they represent. The default is 0, so when it’s not specified these populations will represent values from 0 to 1 (1 is the default end value of the range). For abs_neg, the encoders=[[-1]] line changes the range of values represented from (0,1) to (-1,0). Now we have two populations for dimension d, one that represents only positive values (between 0 and 1), and one that represents only negative values (between -1 and 0). And we’re almost done!

The only thing left to do is to hook up the populations to the input and output appropriately and incorporate the mult_neg_one function into the connection between each of the abs_neg populations and the output relay node. We want each set of populations representing a single dimension to receive and project back into the appropriate dimension of the output relay function, so we employ the index_pre and index_post parameters. Because we want each set to receive only dimension d from the input, on that connection specification we set index_pre=d. When setting up the projections to the output relay node, we similarly only want each population to project to the appropriate output dimension d, so we set index_post=d.

By default, the connect call sets up a communication channel, that is to say no computation is performed on the signal passed from the pre to the post population. This is what we want for abs_pos population, but for the abs_neg population we want the mult_neg_one function to be applied, so that any negative values are multiplied by -1, and give us positive values. This can be done by using the func parameter, and so we call it and set it func=mult_neg_one. Now the connection from abs_neg to the output node will be transformed by the mult_neg_one function.

And that’s it! Here is a script that gets it running (which can also be found on my github: absolute_value.py):

import nef
import random

# constants / parameter setup etc
N = 50 # number of neurons
D = 3 # number of dimensions

def make_abs_val(name, neurons, dimensions, intercept=[0]):
    def mult_neg_one(x):
        return x[0] * -1 

    abs_val = nef.Network(name)

    abs_val.make('input', neurons=1, dimensions=dimensions, mode='direct') # create input relay
    abs_val.make('output', neurons=1, dimensions=dimensions, mode='direct') # create output relay
    
    for d in range(dimensions): # create a positive and negative population for each dimension in the input signal
        abs_val.make('abs_pos%d'%d, neurons=neurons, dimensions=1, encoders=[[1]], intercept=intercept)
        abs_val.make('abs_neg%d'%d, neurons=neurons, dimensions=1, encoders=[[-1]], intercept=intercept)

        abs_val.connect('input', 'abs_pos%d'%d, index_pre=d)
        abs_val.connect('input', 'abs_neg%d'%d, index_pre=d)
    
        abs_val.connect('abs_pos%d'%d, 'output', index_post=d)
        abs_val.connect('abs_neg%d'%d, 'output', index_post=d, func=mult_neg_one)

    return abs_val.network

net = nef.Network('network')

# Create absolute value subnetwork and add it to net
net.add(make_abs_val(name='abs_val', dimensions=D, neurons=N))

# Create function input
net.make_input('input', values=[random.random() for d in range(D)])

# Connect things up
net.connect('input', 'abs_val.input')

# Add it all to the Nengo world
net.add_to_nengo()

And here’s a picture of it running.

Tagged , , ,

Using the same random number generator in C++ and Python

Anyone who has converted code from one language to another, where there is a random number generator involved, knows the pain of rigorously checking that both versions of code do the exact same thing. In the past I’ve done a write out to file from one of the language’s random number generators (RNGs), and then read in from file, but it’s still be a pain in the ass as there’s some overhead involved and you have to change everything back and forth if you need to do debugging / comparison in the future, etc etc. After some searching, it doesn’t seem that there are any common RNGs, and the closest I found was a suggestion saying to write the same algorithm out in each of the languages and use that.

Well. I happen to know how to use Cython now, and rather than rewrite the same code in C++ and Python, I thought it was a perfect opportunity to put to use this knowledge. There’s a program called SimpleRNG for C++ (http://www.codeproject.com/Articles/25172/Simple-Random-Number-Generation), with a whole slew of basic random number generation methods, so the idea was just to take that and throw some Cython at it to get access to the same RNG in both C++ and Python. It turned out to be almost a trivial exercise, but with very useful results!

Since we already have the SimpleRNG.h and SimpleRNG.cpp code all written, taken from link above, we can start by making a .pyx file that will 1) provide a Cython handle to the C++ code, and 2) define a Python class that can call these functions. Remembering not to name the .pyx file the same thing as the .cpp files, for fear of the code generated by Cython overwriting my .cpp files, I added a “py” prefix. The first part of the .pyx file is just redefining all the functionality we want from the header file:
pySimpleRNG.pyx

cdef extern from &quot;SimpleRNG.h&quot;:
    cdef cppclass SimpleRNG:
        SimpleRNG()

        # Seed the random number generator 
        void SetState(unsigned int u, unsigned int v)

        # A uniform random sample from the open interval (0, 1) 
        double GetUniform()

        # A uniform random sample from the set of unsigned integers 
        unsigned int GetUint()

        # Normal (Gaussian) random sample 
        double GetNormal(double mean, double standardDeviation)

        # Exponential random sample 
        double GetExponential(double mean)

        # Gamma random sample
        double GetGamma(double shape, double scale)

        # Chi-square sample
        double GetChiSquare(double degreesOfFreedom)

        # Inverse-gamma sample
        double GetInverseGamma(double shape, double scale)

        # Weibull sample
        double GetWeibull(double shape, double scale)

        # Cauchy sample
        double GetCauchy(double median, double scale)

        # Student-t sample
        double GetStudentT(double degreesOfFreedom)

        # The Laplace distribution is also known as the double exponential distribution.
        double GetLaplace(double mean, double scale)

        # Log-normal sample
        double GetLogNormal(double mu, double sigma)

        # Beta sample
        double GetBeta(double a, double b)

        # Poisson sample
        int GetPoisson(double lam)

Look at all those functions! I left out two functions from the redefinition, namely double GetUniform(unsigned int& u, unsigned int& v) and unsigned int GetUint(unsigned int& u, unsigned int& v), for the simple reason that I don’t want to deal with reference operators in Cython, and I don’t have any need for the functionality provided with the overloaded GetUniform() and GetUint() functions.

Alright, the first part is done. Second part! Straightforward again, define a Python class, create a pointer to cppclass we just defined, and then make a bunch of handle functions to call them up. That looks like:
pySimpleRNG.pyx

cdef class pySimpleRNG:
    cdef SimpleRNG* thisptr # hold a C++ instance
    def __cinit__(self):
        self.thisptr = new SimpleRNG()
    def __dealloc__(self):
        del self.thisptr
    
    # Seed the random number generator 
    def SetState(self, unsigned int u, unsigned int v):
        self.thisptr.SetState(u, v)

    # A uniform random sample from the open interval (0, 1) 
    def GetUniform(self): 
        return self.thisptr.GetUniform()

    # A uniform random sample from the set of unsigned integers 
    def GetUint(self):
        return self.thisptr.GetUint()

    # Normal (Gaussian) random sample 
    def GetNormal(self, double mean=0, double std_dev=1):
        return self.thisptr.GetNormal(mean, std_dev)

    # Exponential random sample 
    def GetExponential(self, double mean):
        return self.thisptr.GetExponential(mean)

    # Gamma random sample
    def GetGamma(self, double shape, double scale):
        return self.thisptr.GetGamma(shape, scale)

    # Chi-square sample
    def GetChiSquare(self, double degreesOfFreedom):
        return self.thisptr.GetChiSquare(degreesOfFreedom)

    # Inverse-gamma sample
    def GetInverseGamma(self, double shape, double scale):
        return self.thisptr.GetInverseGamma(shape, scale)

    # Weibull sample
    def GetWeibull(self, double shape, double scale):
        return self.thisptr.GetWeibull(shape, scale)

    # Cauchy sample
    def GetCauchy(self, double median, double scale):
        return self.thisptr.GetCauchy(median, scale)

    # Student-t sample
    def GetStudentT(self, double degreesOfFreedom):
        return self.thisptr.GetStudentT(degreesOfFreedom)

    # The Laplace distribution is also known as the double exponential distribution.
    def GetLaplace(self, double mean, double scale):
        return self.thisptr.GetLaplace(mean, scale)

    # Log-normal sample
    def GetLogNormal(self, double mu, double sigma):
        return self.thisptr.GetLogNormal(mu, sigma)

    # Beta sample
    def GetBeta(self, double a, double b):
        return self.thisptr.GetBeta(a, b)

Again, very simple. The only thing I’ve added in this code is default values for the GetNormal() method, specifying a mean of 0 and standard deviation of 1, since I’ll be using it a lot and it’s nice to have default values.

Now the only thing left is the setup file:
setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
  name = 'SimpleRNG',
  ext_modules=[ 
    Extension(&quot;SimpleRNG&quot;, 
              sources=[&quot;pySimpleRNG.pyx&quot;, &quot;SimpleRNG.cpp&quot;], # Note, you can link against a c++ library instead of including the source
              language=&quot;c++&quot;),
    ],
  cmdclass = {'build_ext': build_ext},

)

And now calling it from IPython

run setup.py build_ext -i

And pleasantly, now, you can call the SetState(int,int) function and generate the same set of random numbers in both C++ and Python. Which is absolutely wonderful for comparison / debugging. It’s been super useful for me, I hope someone else also finds it helpful!

All the code for this post can be found at my github repository: pySimpleRNG. If you’re simply looking for what you need to get SimpleRNG in Python, then all you need is the SimpleRNG.o file! Drop it into your Python project folder and import away.

Update: It was pointed out that having a Python script example would be useful, which is very true! Here’s a quick script using this shared library.

from SimpleRNG import pySimpleRNG

a = pySimpleRNG()
a.SetState(1,1) # set the seed
a.GetUniform() # remember this number
a.GetUniform()
a.SetState(1,1) # reset seed 
a.GetUniform() # returns the same number as above
Tagged , , , ,
Design a site like this with WordPress.com
Get started