If you’re interested, I posted the slides from last night’s talk on my webpage.
Tag: Python
Segmenting Images In Parallel With Python & Jug
On Friday, I posted an introduction to Jug. The usage was very basic, however. This is a slightly more advanced usage.
Let us imagine you are trying to compare two image segmentation algorithms based on human-segmented images. This is a completely real-world example as it was one of the projects where I first used jug [1].
We are going to build this up piece by piece.
First a few imports:
import mahotas as mh from jug import TaskGenerator from glob import glob
Here, we test two thresholding-based segmentation method, called method1 and method2. They both (i) read the image, (ii) blur it with a Gaussian, and (iii) threshold it [2]:
@TaskGenerator
def method1(image):
# Read the image
image = mh.imread(image)[:,:,0]
image = mh.gaussian_filter(image, 2)
binimage = (image > image.mean())
labeled, _ = mh.label(binimage)
return labeled@TaskGenerator
def method2(image):
image = mh.imread(image)[:,:,0]
image = mh.gaussian_filter(image, 4)
image = mh.stretch(image)
binimage = (image > mh.otsu(image))
labeled, _ = mh.label(binimage)
return labeled
Just to make sure you see what we are talking about. Here is one possible input image:

What you see is cell nuclei. The very bright areas are noise or unusually bright cells. The results of method 1 can be seen as follows:
Each color represents a different region. You can see this is not very good as many cells are merged. The reference (human segmented image looks like this):
Running over all the images looks exactly like Python:
results = []
for im in glob('images/*.jpg'):
m1 = method1(im)
m2 = method2(im)
ref = im.replace('images','references').replace('jpg','png')
v1 = compare(m1, ref)
v2 = compare(m2, ref)
results.append( (v1,v2) )
But how do we get the results out?
A simple solution is to write a function which writes to an output file:
@TaskGenerator
def print_results(results):
import numpy as np
r1, r2 = np.mean(results, 0)
with open('output.txt', 'w') as out:
out.write('Result method1: {}\nResult method2: {}\n'.format(r1,r2))
print_results(results)
§
Except for the “TaskGenerator“ this would be a pure Python file!
With TaskGenerator, we get jugginess!
We can call:
jug execute & jug execute & jug execute & jug execute &
to get 4 processes going at once.
§
Note also the line:
print_results(results)
results is a list of Task objects. This is how you define a dependency. Jug picks up that to call print_results, it needs all the results values and behaves accordingly.
Easy as Py.
§
You can get the full script above including data from github
§
Reminder
Tomorrow, I’m giving a short talk on Jug for the Heidelberg Python Meetup.
If you miss it, you can hear it in Berlin at the BOSC2013 (Bioinformatics Open Source Conference) in July (19 or 20).
| [1] | The code in that repository still uses a pretty old version of jug, this was 2009, after all. TaskGenerator had not been invented yet. |
| [2] | This is for demonstration purposes; the paper had better methods, of course. |
| [3] | Again, you can do better than Adjusted Rand, as we show in the paper; but this is a demo. This way, we can just call a function in milk |
Related articles
- Introduction to Jug: Parallel Tasks in Python (metarabbit.wordpress.com)
- Jug 0.9.4 : A Task Based Parallelization Framework (pypi.python.org)
Introduction to Jug: Parallel Tasks in Python
Next Tuesday, I’m giving a short talk on Jug for the Heidelberg Python Meetup.
If you miss it, you can hear it in Berlin at the BOSC2013 (Bioinformatics Open Source Conference) in July. I will take this opportunity to write a couple of posts about jug.
Jug is a cross between the venerable make and Python. In Make tradition, you write a jugfile.py. Perhaps, this is best illustrated by an example.
We are going to implement the dumb algorithm for finding all primes under 100. We write a function to check whether a number is prime:
def is_prime(n):
from time import sleep
# Sleep a little bit so that this does not run ridiculously fast
sleep(1.)
for j in xrange(2,n-1):
if (n % j) == 0:
return False
return True
Then we build tasks out of this function:
from jug import Task primes100 = [Task(is_prime, n) for n in range(2,101))
Each of these tasks is of the form call “is_prime“ with argument “n“. So far, we have only built the tasks, nothing has been executed. One important point to note is that the tasks are all independent.
You can run jug execute on the command line and jug will start executing tasks:
jug execute &
The nice thing is that it is fine to run multiple of these at the same time:
jug execute & jug execute & jug execute & jug execute &
They will all execute in parallel. We can use jug status to check what is happening:
jug status
Which prints out:
Task name Waiting Ready Finished Running ---------------------------------------------------------------------------------------- primes.is_prime 0 74 20 5 ........................................................................................ Total: 0 74 20 5
74 is_prime tasks are still in the Ready state, 5 are currently running (which is what we expected, right?) and 20 are done.
Wait a little bit and check again:
Task name Waiting Ready Finished Running ---------------------------------------------------------------------------------------- primes.is_prime 0 0 99 0 ........................................................................................ Total: 0 0 99 0
Now every task is finished. If we now run jug execute, it will do nothing, because there is nothing for it to do!
§
The introduction above has a severe flaw: this is not how you should compute all primes smaller than 100. Also, I have not shown how to get the prime values. On Monday, I will post a more realistic example.
It will also include a processing pipeline where later tasks depend on the results of earlier tasks.
§
(Really weird thing: as I am typing this, WordPress suggests I link to posts on feminism and Australia. Probably some Australian reference that I am missing here.)
