Using Python's Dynamic Features to Encapsulate Relational Database Queries Richard Taylor r.taylor@bcs.org.uk October 2002 Introduction Connecting to a relational database is one of the most common things that programmers expect any programming language to support. In this article I hope to show how some of Python's unusual dynamic features can be used to provide a flexible interface to a relational database. Python is well equipped with libraries which give access to many RDBMS, including Oracle, DB/2, Informix, PostgreSQL and MySQL as well as interfaces to JDBC[1] and ODBC[2]. To ease the task of the developer when moving an application between different databases, a group of Python users and database module developers have collaborated on the specification of a standard database access API known as the DB-API, currently at version 2.0. Using the DB-API it is relatively simple to port a Python program from one RDBMS to another, assuming that any database features that the application uses are available on the target database, of course. The DB-API v2.0 has been around since 1999 and an introduction to it can be found on the DB-API Special Interest Group at the Python web site [3]. Rather than repeat a detailed guide to the DB-API, which can also be found on the web site, this article will concentrate on the way in which a simple object wrapper can be layered on top of the DB-API. This wrapper provides an abstraction from the simple SQL procedural interface provided by the DB-API. My own team uses this approach in our database code because it provides some isolation from changes to the database schema and places all of the SQL syntax in the one place. It also makes the main application code more "Pythonic" by wrapping the database records with objects. This approach demonstrates how Python's dynamic interpretation features can be used to make an abstract interface that is highly flexible. Using the DB-API Listing 1 shows a simple example of using a DB-API database adapter, in this case it is the PostgreSQL adapter psycopg[4]. psycopg is tuned for highly multi-threaded applications and has proved very reliable in the applications in which we have used it. The example code here should be portable to any of the DB-API compliant modules, with the possible exception of the connection string which is likely to be specific to each database adapter. import psycopg cnx = psycopg.connect("user=postgres dbname=db_test") cr = cnx.cursor() cr.execute('create table telephone ("name" text, "tel" text)') cr.execute("insert into telephone values ('Richard', '12345678')") cr.execute("insert into telephone values ('Steve', '01002030')") cr.execute("select name, tel from telephone") cnx.commit() result = cr.fetchall() print result [('Richard', '12345678'), ('Steve', '01002030')] Listing 1 - Simple DB API example This example creates a single table, inserts a couple of rows and then pulls them back out. The DB API adaptor returns the result set as a list of tuples. The final line is the output from running the script. You can see from this example that the Python print statement understands how to turn Python data structures into human readable form. This is very useful when debugging an application, particularly because Python's dynamic type system means that you can be unsure of the type of the variable that is being printed. Wrapping the records The low-level API used in Listing 1 provides all the functions required to store and retrieve data in a database. However, littering an application with literal SQL strings and duplicate code to unpack the records from the return list can be tedious. In large applications it can also lead to poorly maintainable code that is highly susceptible to changes in the database schema. The Record class in Listing 2 provides a object that can represent one of the tuples in the return list from the DB-API "fetchall" function. The class constructor uses the description provided by the DB-API to dynamically construct an object that appears to have an attribute for each of the fields in the database record. This is achieved through the use of the special __getattr__ method. Python classes have a small number of special methods which, if overloaded can change the way an object behaves. __getattr__ is called by the interpreter when an attempt is made to access an undefined attribute of an object. This is possible because Python objects hold references to their attributes and methods in a special attribute called __dict__. The interpreter interrogates this dictionary when an attribute or method is accessed and if no match is found the __getattr__ method is called. The __dict__ dictionary can be manipulated directly at runtime to change the behavior of the object. Careful examination of the code in Listing 2 will reveal how the __dict__ attribute and the __getattr__ method can be combined to dynamically give an object attributes that correspond to the columns returned by a database query. Listing 3 shows how this new Record class can be used to encapsulate the records returned from the database. class Record: """A simple class used to wrap the database records returned from a DB-API compliant database module""" def __init__(self, description, values): """The record constructor. The description parameter should be the contents of the cursor.description attribute after the call to cursor.execute(). The values parameter should be a list of field values in the same order as they are returned from the cursor.fetchXXX() call.""" # The cursor.description attribute is a list of # tuples, where the first element of each tuple is the # field name. To make the name lookups a little easier # a list of field names is extracted from the description. field_names = [ d[0] for d in description ] # A dictionary of (field name, field value) pairs is # constructed, keyed on the field name. This is then # used by the __getattr__ and __setattr__ methods # to perform the field lookups. This must be explicitly # added to the __dict__ dictionary in order to shield it # from the __setattr__ method defined below. self._values = {} for field_num in range (0,len(field_names)) : self._values[field_names[field_num]] = values[field_num] def __getattr__(self, attr): """This overrides the attribute access method to look up attributes in the database field names if they are not found in the objects dictionary first.""" if self._values.has_key(attr): return self._values[attr] raise AttributeError def __setattr__(self, attr, value): """This overrides the attributes assignment method to ensure that any assignments to database fields are applied to the _values dictionary.""" if self.__dict__.has_key('_values') and \ self.__dict__['_values'].has_key(attr): self.__dict__['_values'][attr] = value else: self.__dict__[attr] = value Listing 2 - A simple Record class cr.execute("select name, tel from telephone") cnx.commit() records = [] for row in cr.fetchall(): records.append(Record(cr.description, row)) for record_object in records: print "Name = %s Tel = %s" % (record_object.name, record_object.tel) Listing 3 - Using the Record class The Record class in Listing 2 also defines a __setattr__ method. This is another of Python's special methods: it enables assignment to attributes to be as dynamic as attribute access. However there is an important subtle difference between __getattr__ and __setattr__: whereas __getattr__ is called only when no matching attribute can be found, __setattr__ is always called first and only if __setattr__ raises an exception will the interpreter search the rest of an object's attributes. The alert reader will spot that this could easily lead to an infinite recursion if the body of the __setattr__ method accesses an attribute of "self". To alleviate this problem access to the special __dict__ attribute does not cause a call to the __setattr__ method. This one special case enables __setattr__ to be used effectively, if with some care. Having provided the ability to assign new values to the database fields of the Record class the obvious next step is to be able to write the updated values back to the database. Listing 4 introduces a View class to manage the information needed to construct update queries and provide select query methods for a table. class View: """A class to manage access to a database view.""" def __init__(self, tablename, record_class): """The view constructor. The tablename parameter should be a string holding the name of a table that has already been created in the database. The record_class parameter should be a class object that provides the same constructor interface as the Record class below. """ self.tablename = tablename self.record_class = record_class def fetchall(self, cr): """Accessor method. Returns all of the records in a table. The return value is a list of record_class objects populated with the table rows returned from a select * query.""" cr.execute("select * from %s" % self.tablename) return [ self.record_class(self,cr.description, row_values) \ for row_values in cr.fetchall() ] Listing 4 - A simple View class The View class is fairly straightforward but there are a couple of points worth noting. First, the record_class parameter to the __init__ method is a class object not a class instance. In Python class definitions are first class objects themselves and can be passed as parameters just as their instances can. This enables a type of generic programming. In the View class the record_class object is used to instantiate the record objects in the fetchall function. Second, the construct used in the return statement of the fetchall function, denoted by the square brackets, is known as a list comprehension. List comprehensions are a convenient way of constructing anonymous lists and can be used as an alternative syntax to the functional programming functions map and filter[5]. A full explanation of list comprehensions can be found at [6]. Listing 5 shows a new version of the Record class that makes use of the View class to implement an update method. The __getattr__ and __setattr__ methods have been omitted because they are identical to those in Listing 2. The update method writes the record fields back to to the database. It is worth noting the use of copy.deepcopy in the update method. As you would expect, simply assigning a dictionary to a new name only creates a new reference to the dictionary. copy.deepcopy creates a new dictionary with a copy of each element of the original. class Record: """A simple class used to wrap the database records returned from a DB-API compliant database module""" def __init__(self, view, description, values): """The record constructor. The description parameter should be the contents of the cursor.description attribute after the call to cursor.execute(). The values parameter should be a list of field values in the same order as they are returned from the cursor.fetchXXX() call.""" # Remember the view object reference. self.view = view # The description attribute is a list of tuples, where # the first element of each tuple is the # field name. To make the name lookups a little easier # a list of field names is extracted from the description. self.field_names = [ d[0] for d in description ] # A dictionary of field name, field value pairs is # constructed, keyed on the field name. This is then # used by the __getattr__ and __setattr__ methods # to perform the field lookups. This must be explicitly # added to the __dict__ dictionary in order to shield it # from the __setattr__ method defined below. A second # copy is created to be used in the where clauses of # update queries. self._values = {} self._original_values = {} for field_num in range (0,len(self.field_names)) : self._values[self.field_names[field_num]] \ = values[field_num] self._original_values[self.field_names[field_num]] \ = values[field_num] def update(self, cr): """Write back changes made to a record to the database. The cr parameter should be an open cursor object and it is the responsibility of the caller to ensure that commit is called on the cursor if autocommit mode is not used.""" s = "update %s set " % (self.view.tablename,) new_field_values = [] for field in self.field_names: new_field_values.append("%s = '%s'" \ % (field, self._values[field])) s = s + string.join(new_field_values,' , ') + " where " old_field_values = [] for field in self.field_names: old_field_values.append("%s = '%s'" \ % (field,self._original_values[field])) s = s + string.join(old_field_values,' and ') cr.execute(s) self._original_values = \ copy.deepcopy(self._values) Listing 5 - Record class with with update method. Listing 6 demonstrates how the View and Record classes can be put together to access and manipulate the database records of a table without ever explicitly stating what the fields of the record are. Additional fields can be added to the tables without requiring any alteration to the classes and no table specific SQL sytax is required in the main application code. # Create a new cursor (assumes the lead in code from Listing 1). cr = cnx.cursor() view = View("telephone", Record) # Create a view object for # the telephone table. records = view.fetchall(cr) # Fetch a list of all records # in the telephone table. records[0].name = "new name" # alter a field in the first record records[0].update(cr) # write altered record back to db. Listing 6 - Putting it all together. Taking it further By sub-classing View and Record it is very simple to add more complex capabilities. For instance calculated fields could be added to records or complex query methods added to views. By encapsulating business logic in subclasses of View and Record it is possible to ensure that such logic is as independent of the details of the database interface and schema as possible. One technique that I have used in my applications is to insert name mapping into the __getattr__ and __setattr__ methods when a database field name changes. This can enable such changes to happen without altering any of the business logic that would otherwise have to be kept in step. The implementations of View and Record presented here are obviously missing many useful features such as dealing with field types other than strings, views that are made from joins across multiple tables, inserting new records into tables etc. There is also no attempt to catch any errors. However, hopefully I have shown how Python's highly dynamic nature might be used to add all of these features in a very flexible manner. Final thoughts This article has demonstrated how features such as dynamic attribute lookup and class definitions as first class objects can be used to build flexible abstractions in Python. I have used such techniques extensively in the applications on which I have worked. However, I have also learned through bitter experience that such techniques can cause faults that, because Python has little compile time checking, only become apparent at run time and can prove very hard to find. These problems can be alleviated with judicious use of pre- and post-conditions on methods along with careful use of exception handlers to recover from runtime errors. Reading back through this article it also occurs to me that I may give the impression that Python is full of special case rules and littered with built-in method names. This is not the case. There are in practice only a small number of special cases to learn and it is extreamly rare to be tripped up by a built-in method that you have overloaded by accident. Most of the time Python can be used in blissful ignorance of the machinery that enables the dynamic features that I have used in this article. Richard Taylor r.taylor@bcs.org.uk References [1] JDBC: http://www.ziclix.com/zxjdbc/ [2] ODBC Module: http://www.python.org/windows/win32/odbc.html [3] Python Database API Specification v2.0: http://www.python.org/peps/pep-0249.html [4] Psycopg Home Page: http://initd.org/software/psycopg [5] Functional programming in Python: http://www-106.ibm.com/developerworks/library/l-prog.html [6] What's New in Python 2.0: http://www.amk.ca/python/2.0/