Published September 9, 2008 | Version v2008
Dataset Open

ImUnipen image data set for writer identification (N=208) - vectorial handwriting converted to usable images

Authors/Creators

  • 1. University of Groningen

Description


==============
Terms of Usage
==============

The ImUnipen data set is intended for non-commercial, scientific use,
and is distributed under auspices of the Unipen Foundation.

Please always refer to the following paper in IEEE PAMI when using
the ImUnipen data set:

 Bulacu, M.; Schomaker, L.
 Text-Independent Writer Identification and Verification
 Using Textural and Allographic Features
 Pattern Analysis and Machine Intelligence, IEEE Transactions on
 Volume 29, Issue 4, April 2007 Page(s):701 - 717

The ImUnipen data set is derived from the Unipen (unipen.org)
data set of on-line (i.e., vectorial, xy) handwriting.
The xy-coordinates and a line-generator algorithm are used
to generate a raster image, as if the data were optically scanned.

Contents: for 208 writers, there are two PNG images per writer of
an artificially constructed table of naturally written words (49MByte).
These words are pasted onto a white page. For systematics reasons,
we call such a page a Paragraph, see below.

The file names are organized as (example):

   Writ990221.Doc01.Par00.png
   Writ990221.Doc01.Par01.png

   meaning: writer number 990221, document 01 (there exists only Doc01)
and the image with artificial "paragraph" of isolated words "Par00"
and "Par01".

The Par00 and Pa01 images are typically used as the query
and best match in a leave-one-out setting for writer identification.
For instance, Par00 is the query, and Par01 is added to the total set
of all other images as the attractor for an identification search.

For these experiments, word labels are not given in this data set,
on purpose, as the goal is to test recognition-free writer identification
methods.

For a description of the regular
Unipen data set, please visit http://unipen.org

Lambert Schomaker constructed this set in 2005

Notes

http://www.ai.rug.nl/~lambert/unipen/ImUnipen/

Files

Files (44.2 MB)

Name Size Download all
md5:e89e976ee72e49f9842da53f28aa4998
44.2 MB Download

Additional details

References

  • Bulacu, M.; Schomaker, L. Text-Independent Writer Identification and Verification Using Textural and Allographic Features Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 29, Issue 4, April 2007 Page(s):701 - 717