Category Archives: python

Django IA: Auth Password Reset

Django comes with a lot of great built-in functionality. One of the most useful contrib apps is authentication, which (among other things) provides views for login, logout, and password reset. Login & logout are self-explanatory, but resetting a password is, by nature, somewhat complicated. Because it’s a really bad idea to store passwords as plaintext, you can’t just send a user their password when they forget it. Instead, you have to provide a secure mechanism for users to change their password themselves, even if they can’t remember their original password. Lucky for us, Django auth provides this functionality out of the box. All you need to do is create the templates and hook-up the views. The code you need to write to make this happen is pretty simple, but it can be a bit tricky to understand how it all works together. There’s actually 4 separate view functions that together provide a complete password reset mechanism. These view functions are

  1. password_reset
  2. password_reset_done
  3. password_reset_confirm
  4. password_reset_complete

Here’s an Information Architecture diagram showing how these views fit together, using Jesse James Garrett’s Visual Vocabulary. The 2 black dots are starting points, and the circled black dot is an end point.

Django Auth Password Reset IA

Here’s a more in-depth walk-thru of what’s going on, with a fictional user named Bob:

  1. Bob tries to login and fails, probably a couple times. Bob clicks a “Forgot your password?” link, which takes him to the password_reset view.
  2. Bob enters his email address, which is then used to find his User account.
  3. If Bob’s User account is found, a password reset email is sent, and Bob is redirected to the password_reset_done view, which should tell him to check his email.
  4. Bob leaves the site to check his email. He finds the password reset email, and clicks the password reset link.
  5. Bob is taken to the password_reset_confirm view, which first validates that he can reset his password (this is handled with a hashed link token). If the token is valid, Bob is allowed to enter a new password. Once a new password is submitted, Bob is redirected to the password_reset_complete view.
  6. Bob can now login to your site with his new password.

This final step is the one minor issue I have with Django’s auth password reset. The user just changed their password, why do they have to enter it again to login? Why can’t we eliminate step 6 altogether, and automatically log the user in after they reset their password? In fact, you can eliminate step 6 with a bit of hacking on your own authentication backend, but that’s a topic for another post.

Django Forms, Utilities, OAuth, and OpenID Links

Form Customization
Utility Apps
OAuth and OpenID

Various Django App Links

Django version control, design patterns, tree models, and pluggable applications:

Erlang and Python RPC Links

Erlang <–> Python RPC can be done with Thrift or by creating an Erlang node using Twisted OTP:

Python Parsing Links

BNF and Python parsers and parser generators:

Building a NLTK FreqDist on Redis

Say you want to build a frequency distribution of many thousands of samples with the following characteristics:

  • fast to build
  • persistent data
  • network accessible (with no locking requirements)
  • can store large sliceable index lists

The only solution I know that meets those requirements is Redis. NLTK’s FreqDist is not persistent , shelve is far too slow, BerkeleyDB is not network accessible (and is generally a PITA to manage), and AFAIK there’s no other key-value store that makes sliceable lists really easy to create & access. So far I’ve been quite pleased with Redis, especially given how new it is. It’s quite fast, is network accessible, atomic operations make locking unnecessary, supports sortable and sliceable list structures, and is very easy to configure.

Why build a NLTK FreqDist on Redis

Building a NLTK FreqDist on top of Redis allows you to create a ProbDist, which in turn can be used for classification. Having it be persistent lets you examine the data later. And the ability to create sliceable lists allows you to make sorted indexes for paging thru your samples.

Here’s some more concrete use cases for persistent frequency distributions:

RedisFreqDist

I put the code I’ve been using to build frequency distributions over large sets of words up at BitBucketprobablity.py contains RedisFreqDist, which works just like the NTLK FreqDist, except it stores samples and frequencies as keys and values in Redis. That means samples must be strings. Internally, RedisFreqDist also stores a set of all the samples under the key __samples__ for efficient lookup and sorting. Here’s some example code for using it. For more info, checkout the wiki, or read the code.

[sourcecode language=”python”]
def make_freq_dist(samples, host=’localhost’, port=6379, db=0):
freqs = RedisFreqDist(host=host, port=port, db=db)

for sample in samples:
freqs.inc(sample)
[/sourcecode]

Unfortunately, I had to muck about with some of FreqDist’s internal implementation to remain compatible, so I can’t promise the code will work beyond NLTK version 0.9.9. probablity.py also includes ConditionalRedisFreqDist for creating ConditionalProbDists.

Lists

For creating lists of samples, that very much depends on your use case, but here’s some example code for doing so. r is a redis object, key is the index key for storing the list, and samples is assumed to be a sorted list. The get_samples function demonstrates how to get a slice of samples from the list.

[sourcecode language=”python”]
def index_samples(r, key, samples):
r.delete(key)

for word in words:
r.push(key, word, tail=True)

def get_samples(r, key, start, end):
return r.lrange(key, start, end)
[/sourcecode]

Yes, Redis is still fairly alpha, so I wouldn’t use it for critical systems. But I’ve had very few issues so far, especially compared to dealing with BerkeleyDB. I highly recommend it for your non-critical computational needs 🙂 Redis has been quite stable for a while now, and many sites are using it successfully in production

Deploying Django with Mercurial, Fab and Nginx

Writing web apps with Django can be a lot of fun, but deploying them can be a chore, even if you’re using Apache. Here’s a setup I’ve been using that makes deployment fast and easy. This all assumes you’ve got sudo access on a remote server running Ubuntu or something similar.

Mercurial

This setup assumes you’ve got 2 mercurial repositories: 1 on your local machine, and 1 on the remote server you’re deploying to. In the remote repository, add the following to .hg/hgrc

[hooks]
changegroup = hg up

This makes mercurial run hg up whenever you push new code. Then in your local repo’s .hg/hgrc, make sure the default path is to your remote repo. Here’s an example

[paths]
default = ssh://user@domain.com/repo

Now when you run hg push, you don’t need to include the path to the repo, and your code will be updated immediately.

Django FastCGI Deployment

Since I’m using nginx instead of Apache, we’ll be deploying Django with FastCGI. Here’s an example script you can use to start and restart your Django FastCGI server. Add this script to your mercurial repo as run_fcgi.sh.

#!/bin/bash
PIDFILE="/tmp/django.pid"
SOCKET="/tmp/django.sock"
# kill current fcgi process if it exists
if [ -f $PIDFILE ]; then
    kill `cat -- $PIDFILE`
    rm -f -- $PIDFILE
fi
python manage.py runfcgi socket=$SOCKET pidfile=$PIDFILE method=prefork

Important note: the FastCGI socket file will need to be readable & writable by nginx worker processes, which run as the www-data user in Ubuntu. This will be handled by the fab restart command below, or you could add chmod a+w $SOCKET to the end of the above script.

Nginx FastCGI Proxy

Nginx is a great high performance web server with simple configuration. Here’s a simple example server config for proxying to your Django FastCGI process. Add this config to your mercurial repo as django.nginx.

server {
    listen 80;
    # change to your FQDN
    server_name YOUR.DOMAIN.COM;
    location / {
        # must be the same socket file as in the above fcgi script
        fastcgi_pass unix:/tmp/django.sock;
    }
}

On the remote server, make sure the following lines are in the http section of /etc/nginx/nginx.conf

include /etc/nginx/sites-enabled/*;
# fastcgi_params should contain a lot of fastcgi_param variables
include /etc/nginx/fastcgi_params;

You must also make sure there is a link in /etc/nginx/sites-enabled to your django.nginx config. Don’t worry if django.nginx doesn’t exist yet, it will once you run fab nginx the first time.

you@remote.ubuntu$ cd /etc/nginx/sites-enabled
you@remote.ubuntu$ sudo ln -s ../sites-available/django.nginx django.nginx

Python Fabric

Fab, or properly Fabric, is my favorite new tool. It’s designed specifically for making remote deployment simple and easy. You create a fabfile where each function is a fab command that can run remote and sudo commands on one or more remote hosts. So let’s deploy Django using fab. Here’s an example fabfile with 2 commands: restart and nginx. These commands should only be run after you’ve done a hg push.

[sourcecode language=”python”]
config.fab_hosts = [‘YOUR.DOMAIN.COM’]
config.projdir = ‘/PATH/TO/YOUR/REMOTE/HG/REPO’

def restart():
sudo(‘cd %(projdir)s; run_fcgi.sh’, user=’www-data’, fail=’abort’)

def nginx():
sudo(‘cp %(projdir)s/django.nginx /etc/nginx/sites-available/’, fail=’abort’)
sudo(‘killall -HUP nginx’, fail=’abort’)
[/sourcecode]

fab restart

You only need to run fab restart if you’ve changed the actual Django python code. Changes to templates or static files don’t require a restart and will be used automatically (because of the hg up changegroup hook). Executing run_fcgi.sh as the www-data user ensures that nginx can read & write the socket.

fab nginx

If you’ve changed your nginx server config, you can run fab nginx to install and reload the new server config without restarting the nginx server.

Wrap Up

Now that everything is setup, the next time you want to deploy some new code, it’s as simple as hg push && fab restart. And if you’ve only changed templates, all you need to do is hg push. I hope this helps make your Django development life easier. It has certainly done so for me 🙂

Django Datetime Snippets

I’ve started posting over at Django snippets, which is a great resource for finding useful bits of functionality. My first set of snippets is focused on datetime conversions.

The Snippets

FuzzyDateTimeField is a drop in replacement for the standard DateTimeField that uses dateutil.parser with fuzzy=True to clean the value, allowing the parser to be more liberal with the input formats it accepts.

The isoutc template filter produces an ISO format UTC datetime string from a timezone aware datetime object.

The timeto template filter is a more compact version of django’s timeuntil filter that only shows hours & minutes, such as “1hr 30min”.

JSON encode ISO UTC datetime is a way to encode datetime objects as ISO strings just like the isoutc template filter.

JSON decode datetime is a simplejson object hook for converting the datetime attribute of a JSON object to a python datetime object. This is especially useful if you have a list of objects that all have datetime attributes that need to be decoded.

Use Case

Imagine you’re making a time based search engine for movies and/or events. Because your data will span many timezones, you decide that all dates & times should be stored on the server as UTC. This pushes local timezone conversion to the client side, where it belongs, simplifying the server side data structures and search operations. You want your search engine to be AJAX enabled, but you don’t like XML because it’s so verbose, so you go with JSON for serialization. You also want users to be able to input their own range based queries without being forced to use specific datetime formats. Leaving out all the hard stuff, the above snippets can be used for communication between a django webapp and a time based search engine.

Dates and Times in Python and Javascript

If you are dealing with dates & times in python and/or javascript, there are two must have libraries.

  1. Datejs
  2. python-dateutil

Datejs

Datejs, being javascript, is designed for parsing and creating human readable dates & times. It’s powerful parse() function can handle all the dates & times you’d expect, plus fuzzier human readable date words. Here are some examples from their site.

[sourcecode language=”javascript”]
Date.parse("February 20th 1973");
Date.parse("Thu, 1 July 2004 22:30:00");
Date.parse("today");
Date.parse("next thursday");
[/sourcecode]

And if you are programmatically creating Date objects, here’s a few functions I find myself using frequently.

[sourcecode language=”javascript”]
// get a new Date object set to local date
var dt = Date.today();
// get that same Date object set to current time
var dt = Date.today().setTimeToNow();

// set the local time to 10:30 AM
var dt = Date.today().set({hour: 10, minute: 30});
// produce an ISO formatted datetime string converted to UTC
dt.toISOString();
[/sourcecode]

There’s plenty more in the documentation; pretty much everything you need for manipulation, comparison, and string conversion. Datejs cleanly extends the default Date object, has been integrated into a couple date pickers, and supports culture specific parsing for i18n.

python-dateutil

Like Datejs, dateutil also has a powerful parse() function. While it can’t handle words like “today” or “tomorrow”, it can handle nearly every (American) date format that exists. Here’s a few examples.

[sourcecode language=”python”]
>>> from dateutil import parser
>>> parser.parse("Thu, 4/2/09 09:00 PM")
datetime.datetime(2009, 4, 2, 21, 0)
>>> parser.parse("04/02/09 9:00PM")
datetime.datetime(2009, 4, 2, 21, 0)
>>> parser.parse("04-02-08 9pm")
datetime.datetime(2009, 4, 2, 21, 0)
[/sourcecode]

An option that comes especially in handy is to pass in fuzzy=True. This tells parse() to ignore unknown tokens while parsing. This next example would raise a ValueError without fuzzy=True.

[sourcecode language=”python”]
>>> parser.parse("Thurs, 4/2/09 09:00 PM", fuzzy=True)
[/sourcecode]

It don’t know how well it works for international date formats, but parse() does have options for reading days first and years first, so I’m guessing it can be made to work.

dateutil also provides some great timezone support. I’ve always been surprised at python’s lack of concrete tzinfo classes, but dateutil.tz more than makes up for it (there’s also pytz, but I haven’t figured out why I need it instead of or in addition to dateutil.tz). Here’s a function for parsing a string and returning a UTC datetime object.

[sourcecode language=”python”]
from dateutil import parser, tz
def parse_to_utc(s):
dt = parser.parse(s, fuzzy=True)
dt = dt.replace(tzinfo=tz.tzlocal())
return dt.astimezone(tz.tzutc())
[/sourcecode]

dateutil does a lot more than provide tzinfo objects and parse datetimes; it can also calculate relative deltas and handle iCal recurrence rules. I’m sure a whole calendar application could be built based on dateutil, but my interest is in parsing and converting datetimes to and from UTC, and in that respect dateutil excels.

Chunk Extraction with NLTK

Chunk extraction is a useful preliminary step to information extraction, that creates parse trees from unstructured text with a chunker. Once you have a parse tree of a sentence, you can do more specific information extraction, such as named entity recognition and relation extraction.

Chunking is basically a 3 step process:

  1. Tag a sentence
  2. Chunk the tagged sentence
  3. Analyze the parse tree to extract information

I’ve already written about how to train a NLTK part of speech tagger and a chunker, so I’ll assume you’ve already done the training, and now you want to use your pos tagger and iob chunker to do something useful.

IOB Tag Chunker

The previously trained chunker is actually a chunk tagger. It’s a Tagger that assigns IOB chunk tags to part-of-speech tags. In order to use it for proper chunking, we need some extra code to convert the IOB chunk tags into a parse tree. I’ve created a wrapper class that complies with the nltk ChunkParserI interface and uses the trained chunk tagger to get IOB tags and convert them to a proper parse tree.

[sourcecode language=”python”]
import nltk.chunk
import itertools

class TagChunker(nltk.chunk.ChunkParserI):
def __init__(self, chunk_tagger):
self._chunk_tagger = chunk_tagger

def parse(self, tokens):
# split words and part of speech tags
(words, tags) = zip(*tokens)
# get IOB chunk tags
chunks = self._chunk_tagger.tag(tags)
# join words with chunk tags
wtc = itertools.izip(words, chunks)
# w = word, t = part-of-speech tag, c = chunk tag
lines = [‘ ‘.join([w, t, c]) for (w, (t, c)) in wtc if c]
# create tree from conll formatted chunk lines
return nltk.chunk.conllstr2tree(‘\n’.join(lines))
[/sourcecode]

Chunk Extraction

Now that we have a proper NLTK chunker, we can use it to extract chunks. Here’s a simple example that tags a sentence, chunks the tagged sentence, then prints out each noun phrase.

[sourcecode language=”python”]
# sentence should be a list of words
tagged = tagger.tag(sentence)
tree = chunker.parse(tagged)
# for each noun phrase sub tree in the parse tree
for subtree in tree.subtrees(filter=lambda t: t.node == ‘NP’):
# print the noun phrase as a list of part-of-speech tagged words
print subtree.leaves()
[/sourcecode]

Each sub tree has a phrase tag, and the leaves of a sub tree are the tagged words that make up that chunk. Since we’re training the chunker on IOB tags, NP stands for Noun Phrase. As noted before, the results of this natural language processing are heavily dependent on the training data. If your input text isn’t similar to the your training data, then you probably won’t be getting many chunks.