Posts Tagged database

littletable API Design and the Adapter Factory Pattern

About 14 years ago, while trying to understand the rationale and philosophy behind ORMs and how they are used, I wrote a small in-memory data store package so that I could learn by creating. About this time Google released its NoSQL data store BigTable, so I decided to name my package littletable.

I wanted to be able to quickly get to the data search/sort/pivot parts, so I chose to skip the schema definition step and build my littletable Tables by just throwing in a bunch of Python objects, and letting those objects’ attributes define the “columns” of the table. In implementation, a Table is really just a wrapper around a Python list, with features like indexing and sorting, and methods like select() (to choose which columns are of interest), where() (to specify rows to filter for), orderby() (to sort the table in place), and present() (to display the table in a nice tabular format). Anything requiring information about the Table‘s columns gets handled using Python’s vars() and getattr() builtins.

Using these methods, I could write code to get the SKU and description for all products from a product catalog Table where the unit of measure is “FT” as:

items_sold_by_the_foot = catalog.where(unitofmeas="FT").select("sku descr")
items_sold_by_the_foot.present()

  Sku         Descr
 ───────────────────
  ROPE-001    Rope

I also wanted Tables to support dict-like access for attributes that were unique identifiers, like selecting a specific item from that product catalog by its identifying SKU number. But I wanted something more elegant than a straight-up method call like:

item = catalog.get_index("sku")["001"]

I felt that I could use Python’s flexibility in dynamically getting values based on attribute names, using the getattr() builtin.

However, if I were to implement an API that allowed a user to write: 

item = catalog.sku["001"]

to get the item with SKU=”001″, this would have implications for my API and for users as well. User-defined attributes could collide with defined Table properties and methods, or worse, could raise backward-compatibility issues when I might want to add new Table properties or methods in the future.

To handle this, I defined an object that I termed an IndexAccessor, that users would get using a new by attribute on Tables. Using by, a user would write the above statement as:

item = catalog.by.sku["001"]

Now, as long as the attribute names conform to Python’s rules for valid identifiers, users can create indexes on their various data properties, and still use a consistent API for retrieving items by key fields, as in these examples:

product = catalog.by.sku["001"]
station = radio_station.by.call_letters["WMMS"]
hawaii = us_states.by.state_abbrev["HI"]
central_us_time = timezones.by.name["America/Chicago"]

Here is how the Adapter Factory works. The Table by property returns an IndexAccessor object that contains a reference back to the original Table. The IndexAccessor implements __getattr__ that takes an attribute name that should match an index defined on the original table. The IndexAccessor then returns an IndexWrapper that implements __getitem__ to support the dict key lookup. The IndexWrapper uses the index on the original Table to resolve the key and return the matching object from the table.

Here is how that looks as a UML sequence diagram:

catalog.by.sku also supports other Mapping methods, such as keys(), values(), and len().

A class diagram for an abstract Adapter Factory might look like:

The ObjectAdapter can implement custom behavior on the wrapped attribute in the original Object. In the case of IndexAccessor, the access method on Table (the Object) is by, which creates the wrapper pointing to the SKU dict of the Table. The action in IndexWrapper is __getitem__, which delegates to that dict to get the matching item from the Table.

Here is a full example of Python code to create the catalog Table and the sku index, load some products, and select a product by SKU:

from littletable import Table

catalog = Table('catalog')
catalog.create_index("sku", unique=True)

catalog.csv_import("""\
sku,descr,unitofmeas,unitprice
ANVIL-001,500lb anvil,EA,100
BRDSD-001,Bird seed,LB,3
MAGNT-001,Magnet,EA,8
MAGLS-001,Magnifying glass,EA,5
ROPE-001,Rope,FT,0.1
""")

Since we created a unique index on sku, accessing by this index will return the unique object from the Table with that SKU value, or raise KeyError if one does not exist:

print(catalog.by.sku["MAGLS-001"].descr)
# prints 'Magnifying glass'

print(catalog.by.sku["ROCKT-001"].descr)
# raises KeyError

When by is used against a non-unique index, it behaves like a defaultdict[Table], returning a new Table containing just the matching items, or an empty Table if there are none.

catalog.create_index("unitofmeas")
# display a table of all items sold by the piece
catalog.by.unitofmeas["EA"].present()

displays

  Sku         Descr              Unitofmeas   Unitprice 
 ───────────────────────────────────────────────────────
  ANVIL-001   500lb anvil        EA                 100
  MAGNT-001   Magnet             EA                   8
  MAGLS-001   Magnifying glass   EA                   5

I recently added full text search on littletable Tables, by adding a search Accessor following a similar pattern. 

peps = littletable.csv_import("peps_with_abstracts.csv")
peps.create_search_index("abstract")

# find PEPs containing the word "operator"
operator_related_peps = peps.search.abstract("operator")

You can see this in action at https://ptmcg.pythonanywhere.com/pep_search.

Try littletable for yourself, using pip install littletable.

, , , , ,

Leave a comment