Python for the Busy Java Developer

20150118-pybook

I use Python once in a while to get some simple work done, like read, write, process or plot some data. I inevitably end up Googling or looking up analogues to C++ constructs on StackOverflow to get the job done. On one such recent stint, I found it useful to refer to the book Python for the Busy Java Developer as I worked on object-orientifying a piece of Python code. On tweeting this, my friend Deepak Sarda, who is also the author of that book offered me a copy of the book for review.

Python for the Busy Java Developer is a short book that enables folks familiar with Java to get started on reading and writing Python with ease. If you already have experience working with other object-oriented languages with a C-like syntax (like C# or C++), this might also be the right book for you. Following a short introduction, the meat of the book is essentially Chapter 2, which is pretty long and takes the reader on a trip through the syntax, lists, functions and classes. Chapter 3 lists the tools and libraries that any seasoned developer would be looking out for while writing code.

There is a big difference between Googling a problem and learning from an experienced person. It is just that kind of insight that I found at many places in this book. By just looking for analogues to C++, I had never realized of the alternate or extra possibilities for certain language constructs. As an example, it never occurred to me that attributes or methods of a class could be deleted outside the class. This is of course in the very nature of a dynamic language like Python, but the thought never occurs to a person coming from a statically typed language.

Though I’ve been writing short Python scripts for a few years, from this book I found myself learning many tiny details that are sure to help me write code that is more Pythonic. My familiarity is with C++, but I found the analogies to Java in the book quite straightforward to relate to. The concepts are introduced in a natural order and the book can be easily finished in a couple of hours. Places where Python might behave differently or in ways that is better than you except are pointed out. The book is beautifully typeset, which is a quality that seems to be sorely missing in tech books today. A book of this length has to leave out a lot and that is possibly the biggest downside of the book. I was left wanting to learn more of the Python analogues to the other C++ constructs I’m aware of.

I can easily recommend Python for the Busy Java Developer as a quick guide to learn Python if you are coming from C++. You can buy it here for the price of a cup of coffee. But wait! Deepak is offering the readers of this blog a 10% discount, just use this link to buy the book. πŸ™‚

How to read string from terminal in Python

Reading a character or string from the terminal is a basic operation in most languages. It is most commonly used for a quick debug or analysis of the program.

To do this in Python 2.x:

print "Enter something:"
s = raw_input()
print "You entered:", s

To do this in Python 3.x:

print("Enter something:")
s = input()
print("You entered:", s)

Note that in Python 2.x, the built-in function is raw_input, while it has been renamed to input in Python 3.x.

Tried with: Ubuntu 14.04

The bitarray Python module

It is fairly common in many applications to read or write series of bits into bytes or longer array of bytes. For example, to store results of various binary tests. The space saved by storing these results as bits might be crucial when memory or disk resources are the bottleneck.

If you deal with an array of bits, the Python module you need is bitarray.

Installing it for Python 2.x is simple:

$ sudo apt install python-bitarray

If you are using Python 3.x:

$ sudo apt install python3-bitarray

Creating an empty bitarray is easy:

import bitarray
ba = bitarray.bitarray()

Note that the endianess of the data is very important in these applications. bitarray uses big-endian interpretation by default. If your data is stored in little-endian, then specify this explicitly:

ba = bitarray.bitarray(endian="little")

You can check the endianess of any bitarray:

print(ba.endian())

To initialize a bitarray from a byte string:

ba.frombytes(byte_str)

To get or set a specific bit in the bitarray:

ba[9] = 0
print(ba[9])

Tried with: Bitarray 0.8.0, Python 3.4 and Ubuntu 14.04

How to execute function in multiple threads in Python

One of the most common sources of performance optimization is to find sections of code where a function executes on multiple data items independently. This type of parallelization can be easily achieved by using multithreading in Python. Alex Martelli provides a MultiThread class in his book Python Cookbook (2 Ed) that does this:


# Class to execute a function in parallel across multiple data
# Adapted from code in Sec 9.5 of book Python Cookbook (2 Ed)
import threading
import time
import Queue
class MultiThread(object):
def __init__(self, function, argsVector, commonArgs, maxThreads=5, queue_results=False):
self._function = function
self._lock = threading.Lock( )
self._nextArgs = iter(argsVector).next
self._commonArgs = commonArgs
self._threadPool = [ threading.Thread(target=self._doSome) for i in range(maxThreads) ]
if queue_results:
self._queue = Queue.Queue()
else:
self._queue = None
def _doSome(self):
while True:
self._lock.acquire( )
try:
try:
args = self._nextArgs( )
except StopIteration:
break
finally:
self._lock.release( )
result = self._function(args, self._commonArgs)
if self._queue is not None:
self._queue.put((args, result))
def get(self, *a, **kw):
if self._queue is not None:
return self._queue.get(*a, **kw)
else:
raise ValueError, 'Not queueing results'
def start(self):
for thread in self._threadPool:
time.sleep(0) # necessary to give other threads a chance to run
thread.start( )
def join(self, timeout=None):
for thread in self._threadPool:
thread.join(timeout)
if __name__=="__main__":
import random
def recite_n_times_table(n, _):
for i in range(2, 11):
print "%d * %d = %d" % (n, i, n * i)
time.sleep(0.3 + 0.3*random.random( ))
return
argVector = range(2, 4)
mt = MultiThread(recite_n_times_table, argVector, None)
mt.start( )
mt.join( )

view raw

multithread.py

hosted with ❤ by GitHub

I was disappointed after using this on my data. Not only was there no speedup, even with 8 threads, you might sometimes even see a regression from the single-threaded speed! The reason for this is that the Python interpreter cannot actually perform multithreading. It uses an internal Global Interpreter Lock (GIL) to prevent corruption of all data structures. So, the only times you might want to use multithreading is when it makes programming easier: for GUI programs or for networking programs.

Tried with: Python 2.7.6 and Ubuntu 14.04

How to convert to and from bytes in Python

A typical scenario in systems programming is to read and write structures and data in a certain binary format. This could be using buffers in memory or binary files on disk. This is pretty straightforward to do in C or C++, all you need is a pointer to the start of the structure or data and the number of bytes to copy or write from that pointer. The struct module can be used to achieve the same in Python.

The pack method can be used to convert data to a byte string. For details of the parameters and formatting flags passed to this method, see here. This byte string can be then be passed on the network or written to a binary file. For example:


# Sample code to write data of certain format to binary file
# For details of parameters passed to pack method see:
# https://docs.python.org/2/library/struct.html
import struct
# Assume these are values of fields of a class/struct in C/C++
a = 1 # unsigned int
b = 2 # signed char
c = -1 # signed int
# To write a, b, c with padding to binary file
# This will write 12 bytes => (4, 4, 4)
f = open("foo1.bin", "wb")
s = struct.pack("Ibi", a, b, c)
f.write(s)
f.close()
# To write a, b, c with no padding to binary file
# This will write 9 bytes => (4, 1, 4)
f = open("foo2.bin", "wb")
s = struct.pack("=Ibi", a, b, c)
f.write(s)
f.close()

view raw

struct_pack.py

hosted with ❤ by GitHub

To read back binary data or data stored in bytes, use the unpack method. The usage of that is similar to above.

Tried with: Python 2.7.6 and Ubuntu 14.04

How to name a Python file

Every Python source file can essentially be considered as a module. The following rule applies to the name of a module, if it needs to be imported without problems:

(letter|"_") (letter | digit | "_")*

That is, the rule for naming a Python file is the same as naming a variable in it. First character has to be letter or underscore, followed by any number of letters, digits or underscores. No other characters, like dash, are allowed.

If the file is not named correctly, the following error or warning might be generated on importing it:

Invalid name for Python module

Tried with: Python 2.7.6 and Ubuntu 14.04

How to define constant in Python

Due to the design and dynamic typing of Python, there is no inherent type qualifier that is equivalent to the const of C or C++. I like the solution provided in Section 6.2 (Defining Constants) of the book Python Cookbook (2 Ed) by Alex Martelli et al.


# Helps define a constant in Python
# From Section 6.3: Defining Constants
# Python Cookbook (2 Ed) by Alex Martelli et al.
#
# Usage:
# import const
# const.magic = 23 # First binding is fine
# const.magic = 88 # Second binding raises const.ConstError
class _const:
class ConstError(TypeError):
pass
def __setattr__(self,name,value):
if self.__dict__.has_key(name):
raise self.ConstError, "Can't rebind const(%s)" % name
self.__dict__[name] = value
import sys
sys.modules[__name__] = _const()

view raw

const.py

hosted with ❤ by GitHub

Tried with: Python 2.7.6 and Ubuntu 14.04

How to execute command at shell from Python

There are many operations at the shell that need a bit of looping or automation. You can learn programming in the language of the shell you use to achieve this. Since I know a bit of Python, I prefer to use it for running quick commands at the shell. The call to execute a command at the shell is os.system.

I typically use this call to automate repeated commands that I want to run at the shell. For example, I open a Python interpreter from your shell and type:

import os

for i in range(100, 200):
    s = "montage foo-" + str(i) + ".png bar-" + str(i) + ".png -tile 2x1 foobar-" + str(i) + ".png"
    os.system(s)

This quickly makes pairs from two sets of 100 images I have, puts them together and creates a new set of 100 images. Pretty sweet to automate operating on 100 images with just a few lines of code! πŸ™‚

Tried with: Python 2.7.6 and Ubuntu 14.04

Cannot import name GConf error

Problem

$ hamster 
ERROR:root:Could not find any typelib for GConf
Traceback (most recent call last):
  File "/usr/bin/hamster", line 30, in <module>
    from hamster import client, reports
  File "/usr/lib/python2.7/dist-packages/hamster/reports.py", line 32, in <module>
    from hamster.lib.configuration import runtime
  File "/usr/lib/python2.7/dist-packages/hamster/lib/configuration.py", line 33, in <module>
    from gi.repository import GConf as gconf
ImportError: cannot import name GConf

Solution

To be able to import GConf from Python script, install this package:

sudo apt install  gir1.2-gconf-2.0

Tried with: Ubuntu 14.04

How to install and uninstall Python package from source

A lot of Python packages are only available as source code.

Install

To install such a Python package, use its setup.py file:

$ sudo python setup.py install

This will install the Python files to a central location such as /usr/local/lib. If you do not have such permissions or want to install to a user-local location then try this:

  • Create a directory, say /home/joe/python_libs/lib/python in your home directory to host local Python packages.
  • Set the above path in the PYTHONPATH environment variable.
  • Install the package by passing the above path to the --home parameter:
$ python setup.py install --home /home/joe/python_libs

Uninstall

To uninstall a package is tricky.

One solution is to find out where the files were installed and then use that list to remove those files:

$ sudo python setup.py install --record install-files.txt
$ cat install-files.txt | sudo xargs rm -rf

Another solution is to just locate the installation directory, like /usr/local/lib/python2.7/dist-packages for example and delete the directory of the package.

Tried with: Python 2.7.6 and Ubuntu 14.04