Category Archives: turbogears

Porting chrss from Turbogears 1.0 to Django 1.3

For the a period of 18 months or so I slowly ported my chess by rss site (chrss) from Turbogears 1.0 to Django. When I started the port Django 1.1 was the latest version. I took so long to finish the port, that Django got to version 1.3 in the meantime! The slowness was mostly due to my then pending and now actual fatherhood. However by slowly toiling away, with the aid of quite a few automated tests I managed to deploy a Django version of the chrss the other week a couple few of months ago.

Python Framework to Python Framework

The good thing about moving from Turbogears to Python was that it’s all Python code. This meant that things like the core chess code didn’t need touching at all. It also meant that a lot of fairly standard code could be almost directly ported. Methods on controller objects in Turbogears became view functions in Django. Methods on model objects stayed methods and so on. A lot of the porting was fairly mechanistic. I moved all of the Turbogears code into a folder for easy referral and then built the Django version from nothing back up. Initially most of the work was done at the model/db level where I could copy over the code, convert it to Django style and then copy over and update the automated tests. I used Django Coverage to make sure the code was still all actually getting tested.

I could have opted to exactly match the database from the Turbogears version, but opted instead to make it a more Django like. This meant using the regular Django user models and so on. As Turbogears 1.0 involved a custom user model I had to create a separate user profile model in the Django port. There were a few other changes along these lines, but most of the porting was not so structural.

A lot of the hard work during porting came from missing bits of functionality that had far reaching effects. Testing was very awkward until a lot of the code had been ported already.

Cheetah templates to Django templates

Chrss used Cheetah for templates. Cheetah is not as restrictive with it’s templates as Django. It’s very easy to add lots of logic and code in there. Some pages in chrss had quite a bit of logic – in particular the main chess board page. This made porting rather tricky with Django. I had to put a lot of code into template tags and filters and carefully re-organise things. Porting the templates was probably the hardest part. Automated tests helped a bit with this, but a lot of the issues needed visual verification to ensure things really were working as they should.

One advantage of going from Cheetah to Django’s tempate language was the incompatible syntax. This meant I could simply leave bit’s of Cheetah code in a template and it would serve as quite a good visual reminder of something that was yet to be ported.

The second 90%

A good portion of the site was ported before my son’s birth. It seemed like maybe it wouldn’t take much longer, as it felt like 90% of the work was done. Of course it turned out there was another 90% yet to finish.

Beyond the usual tweaking and finishing of various odds and ends, the remaining work consisted of:

  • Completing the openid integration
  • Migrating the database

For the open id integration I opted to use Simon Willison’s Django OpenID app – hoping to be able to have a nice drop-in app. Possibly due to the slightly unfinished nature of the app and mostly due to my desire to customise the urls and general flow of the login/register process this took a fair while. It might have been quicker directly porting the original OpenID code I used with Turbogears, but it did work out ok in the end.

Of course it turns out that after all that hard work, that hardly anyone seems to use OpenID to login to sites anymore. I might have been better off integrating Django Social Auth instead, for Twitter and/or Facebook logings. However I decided that this would have been too much like feature creep and stuck with the original plan.

The chrss database isn’t very large. The table recording the moves for each game is currently over 70,000 rows, but each row is very small. So the database dump when tar gzipped is less than 3Mb in size. This gave me plenty of options for database migration. I’d roughly matched the schema used for the Turbogears site, but did take the opportunity to break from the past slightly. I created a Python script that used talked to the MySQL database and generated an sql dump in the new format. By using Python I was free to do any slightly more complex database manipulation I needed. The only real tricky part was converting to using Django’s user models.

One wrinkle with the database migrating was a bit disturbing. When I setup the app on webfaction I found some very odd behaviour. Logging in seemed to work, but after a couple of page requests you’d be logged out. The guys at webfaction were very helpful, but we were unable to get to the bottom of the problem. During testing I found that this problem did not happen with SQLite or Postgres, so was something to do with MySQL. This was one of those times when using an ORM paid off massively. Apart from the database migration script no code needed to be touched. If I’d had more time I might have persevered with MySQL, but Postgres has been working very well for me.

Conclusion

Chrss has been running using Django and Postgres for nearly eleven months now and has been working very well. I’ve had to fix a few things now and then, but doing this in Django has been very easy. I was also able to automate deployment using Fabric, so new code could be put live with the minimum of fuss. When you’ve only got a limited time to work with, automated deployment makes a huge difference.

Hopefully now that sleep routines are better established and my own sleep is less interrupted I’ll have a chance to add new features to chrss soon.

Turbogears, remember me

So a while back I implemented a remember me feature for chrss. I said I’d release the code for it and am finally now getting round to it.

Please note that this kind of “remember me” functionality can represent a potentially security hole. It makes sense for some sites where the convenience out weighs any problems that would occur if someone fraudulently gains access to the site. As I wrote this for a site that is concerned with playing chess online it seemed worth it.

So to get started this is meant to work with:

Also note that I’ve left some of the imports as they appear for my app (chrss), so you’ll need to change them as appropriate.

The idea

Conceptually a regular request with a remember me feature works thus:

  • If the user is not logged in, we check for a “remember me” cookie
  • If the cookie is present then we check to see if it matches a token (which maps to a user) in the database
  • If there’s a match to a user we can login the user and on future requests we can ignore the remember me cookie (everything works as before)

The token in the database is randomly generated when the user logs in (with the “remember me” option ticked on the login form) in a similar way to any kind of session tracking cookie. The different is that the token/cookie is meant to hang around for much longer than a regular session. It’s used in addition to Turbogears tg-visit cookie and is just a handy shortcut for logging in a user automatically. This means that it’s fairly non-invasive in so far as it interacts with the Turbogears identity framework.

The code

First of all we need a table in the database to connect the remember me token to a user. So in my models I defined the following entity:


class RememberMe(SQLObject):
    user_token = StringCol(length=40, alternateID=True,
            alternateMethodName="by_user_token")
    user_id = IntCol()
    expiry = DateTimeCol()
    
    expiry_index=DatabaseIndex(expiry)

The rest of the code then lives in remember_me.py.

First there’s the code to “remember” a user. This creates a RememberMe entity and sets a cookie on the user’s machine:


def generate_token():
    key_string= '%s%s%s%s' % (random.random(), datetime.now(),
                              cherrypy.request.remote_host,
                              cherrypy.request.remote_port)
    return sha.new(key_string).hexdigest()

def remember_user(user):
    from chrss.model import RememberMe
    
    user_token=generate_token()
    expiry=datetime.now() + timedelta(days=remember_me_age_days)
    remember=RememberMe(user_token=user_token, user_id=user.id,expiry=expiry)
    
    cookies= cherrypy.response.simple_cookie
    max_age = remember_me_age_days*24*60*60
    cookies[remember_cookie_name] = remember.user_token
    cookies[remember_cookie_name]['path'] = '/'
    cookies[remember_cookie_name]['expires'] = formatdate(time() + max_age)
    cookies[remember_cookie_name]['max-age'] = max_age

Here’s the reverse function to “un-remember” a user (which you would call from your logout method):


def unremember_user(user):
    cookies = cherrypy.request.simple_cookie
    if remember_cookie_name in cookies:
        user_token=cookies[remember_cookie_name].value
        
        if user_token:
            from chrss.model import RememberMe
            try:
                remember=RememberMe.by_user_token(user_token)
                remember.destroySelf()
            except SQLObjectNotFound:
                pass
            
            # now clear cookie
            cookies= cherrypy.response.simple_cookie
            cookies[remember_cookie_name] = ''
            cookies[remember_cookie_name]['path'] = '/'
            cookies[remember_cookie_name]['expires'] = 0
            cookies[remember_cookie_name]['max-age'] = 0

Before I get onto the two monkey patches, we need to make one more function, that we use to login the user given a user entity (bypassing the need for their username and password) and is based on code from here:


def login_user(user):
    ''' from http://docs.turbogears.org/1.0/IdentityRecipes'''
    visit_key = turbogears.visit.current().key
    IdentityObject = turbogears.identity.soprovider.SqlObjectIdentity
    
    from chrss.model import VisitIdentity
    try:
        link = VisitIdentity.by_visit_key(visit_key)
    except SQLObjectNotFound:
        link = None
    if not link:
        link = VisitIdentity(visit_key=visit_key, user_id=user.id)
    else:
        link.user_id = user.id
    user_identity = IdentityObject(visit_key);
    return user_identity

The monkey patches

Now we get to the meat of the code – the bit which does the actual “magic”. In both cases we are monkey-patching methods that belong to the IdentityVisitPlugin class in Turbogears (defined in turbogears.identity.visitor).

First up is identity_from_visit which normally just looks for the tg-visit cookie and then sees if that’s associated with a user login or not. We shall effectively override it, so that if no association is found then we will perform a further check to see if there is a remember me cookie that will let us log the user in:


# keep a reference to the original function
old_identity_from_visit=turbogears.identity.visitor.IdentityVisitPlugin.identity_from_visit

def identity_from_remember_me( self, visit_key ):
    identity=old_identity_from_visit( self, visit_key )
    if identity.anonymous:
        # not logged in so check for remember me cookie
        cookies = cherrypy.request.simple_cookie
        if remember_cookie_name in cookies:
            log.info("checking remember me cookie")
            user_token=cookies[remember_cookie_name].value
            
            from chrss.model import RememberMe, User
            try:
                remember=RememberMe.by_user_token(user_token)
                user=User.get(remember.user_id)
                return login_user(user)
            except SQLObjectNotFound:
                pass
            
    return identity

# monkey-patch the method
turbogears.identity.visitor.IdentityVisitPlugin.identity_from_visit=identity_from_remember_me

The next method we patch is identity_from_form. For this we just check whether there is a “remember_me” parameter in the request after a successful login (from calling the original method) and if so call the remember_user() function.


old_identity_from_form=turbogears.identity.visitor.IdentityVisitPlugin.identity_from_form

def identity_from_form(self, visit_key):
    identity=old_identity_from_form(self, visit_key)
    if identity is not None and not identity.anonymous:
        # login worked, so now see if 'remember me' set
        params=cherrypy.request.params
        remember_me=params.pop('remember_me', None)
        if remember_me:
            remember_user(identity.user)
    return identity

turbogears.identity.visitor.IdentityVisitPlugin.identity_from_form=identity_from_form

You’ll just import the remember_me module early on in starting up your Turbogears app and it will apply these monkey patches. Then if you modify your login template to include a “remember_me” checkbox you should have everything working.

As I said before it’s fairly non-invasive (as far as monkey patches go), so there shouldn’t really be a need to modify much beyond your login form and to add a call to unremember_user to your logout code. The only other thing is perhaps to setup a cron-script or other background task to delete expired entries in the database (which is why the RememberMe entity has an expiry column).

Source code

The remember_me module is available for download here.

Splitting your Turbogears SQLObject models

Just a quick note about splitting your model.py file in Turbogears 1.0, when using SQLObject. The Turbogears docs have some notes on this, but there was an extra trick to it in the end.

The model.py file for chrss, was starting to get a bit big, so it seemed like a good time to do this.

First I moved model.py into model/__init__.py. Then I moved all of the model code itself into separate files (three as it happens) and imported them into model/__init__.py as indicated in the Turbogears docs:


from chrss.model.cms import *
from chrss.model.chess import *
from chrss.model.base import *

However that wasn’t enough, as the __connection__ module level variable for SQLObject wasn’t set and Turbogears couldn’t connect to the DB. So I added this to model/__init__.py (before the other imports):


from turbogears.database import PackageHub

hub = PackageHub("chrss")

and then in each file containing models added the following:


from chrss.model import hub
__connection__ = hub

The main trick was to get the import order correct. model/__init__.py must declare the hub variable, before importing the other files, so that they can access it when they are imported. It’s a bit of a cyclical dependency, which is maybe not ideal, but it’s only used in a limited way.

UPDATE. It turns out that you also need to update the sqlobject.txt file in the .egg-info directory of your project. Otherwise the various tg-admin sql * commands don’t work (as it can’t find the SQLObject classes). Basically you have to list every sub-package of the newly split model package. e.g. change:


db_module=chrss.model

to:


db_module=chrss.model,chrss.model.base,chrss.model.chess,chrss.model.cms

A turbogears caching decorator

A while back I wrote a caching decorator for chrss. It’s mostly used for the rss feeds, to help avoid having over-zealous rss readers slowing the site down. However I’m also now running it on a few other pages that were a bit slow (notably generating PGN files for games).

After letting it sit for a while Ian and Kyran also started using it on ShowMeDo. That was a couple of months ago. So now that I can be fairly certain it works it seemed like time to share it with the world.

So first off here’s a few features/comments:

  • It’s shamelessly based on code from Django (the caching backends at least)
  • It features an “anti-dogpiling” mechanism to try and make sure only one thread triggers a cache refresh
  • Multiple backends supported:
    • dummy – does no caching (for testing/development use)
    • simple – just uses a dictionary (for testing/development use)
    • localmem – thread-safe cache for use in process
    • file – uses file-system to cache data (this is what’s used with chrss and ShowMeDo)
    • memcache – uses memcached for caching (it should work, but not massively tested at the moment)

Now for some example usage:


from turbogears import expose, controllers
from cache import cache_result

class MyController(controllers.Controller):

    @expose(content_type="text/plain")
    @cache_result()
    def cache_some_text(self):
        ''' no template so it's pretty straightforward - expose just has to come first '''
        return 'this will be cached'

    @expose(template="my_template)
    @cache_result()
    def cache_data_only(self):
        ''' with a template we can just cache the data and not rendered html '''
        return dict(value='this dictionary will be cached')

    @expose()
    @cache_result()
    @expose(template="my_template)
    def cache_html(self):
        ''' 
        or we can cache the rendered html, but we have to use an outer expose()
        to make the method public
        '''
        return dict(value='will be cached with the html')

To see how you can use the @cache_result() you are probably best looking
at the source (there’s a fairly detailed comment explaining it). The following parameters can
be passed in:

  • key_fn – function used to derive a key to store the data in (default uses current url and user identity)
  • version_fn – can be used to control how a cached value expires (defaults to a function that returns the same value everytime)
  • timeout_seconds – how many seconds until the value start to expire

The default key function can be controlled via the config parameter cache.decorator.anon_only. If set to True (the default) it will only look in the cache for data when users are not logged in. Otherwise when users are logged in it will use a key just for them. The default is handy if you just want to avoid problems with a flood of anonymous users (e.g. from Slashdot/Digg etc).

The version function can be used to force expire cached values. The value of the version function is compared to the value stored in the cache and if different this triggers a cache refresh. For example if the version function was based on the number of comments in a blog post, then whenever a comment was added to the blog post the cache would get refreshed. This avoids having to wait for the cache to expire.

timeout_seconds specifies how many seconds before a value expires. It defaults to the value set in cache.decorator.timeout_seconds in your config file (or 1800 seconds if not set there).

anti-dogpiling

So first I’ll explain what I mean by “dogpiling” with respect to cache expiry.

The standard way to use a cache is to do something like:


value = cache.get('key', None)
if value is None:
    value = recompute_cached_value()
    cache['key'] = value
return value

Now this is fine normally. When the cached value expires the next request will simply call recompute_cached_value() and the cache will be updated for future requests.

The trouble arises when recompute_cached_value() takes a long time to run and you have have a lot of other requests running at the same time. If a request is still recalculating the value and another request comes along, then that will also attempt to recalculate the value. This will in turn probably slow down the calculation going on, making it more likely that the next request to arrive will also trigger a recalculation and so on. Very quickly you can end up with tens/hundreds/thousands of request all attempting to recalculate the cached value and you have lost most of the advantage of caching in the first place.

So to handle this situation more gracefully this caching decorator employs a two stage expiry.

First there is a hard cut off expiry that works like normal. This is set to occur later than the other expiry time and is the value that would be fed to memcache or equivalent.

The second expiry time set is the one normally used. Basically when we store/retrieve the cached data we also have access to this expiry time (and the version). If we see that we need to recalculate the value (due to the expiry time being in the past or the version being different), then we attempt to grab a lock to recalculate the value. If we don’t grab the lock, we assume another thread is doing the recalculation and rather than wait around we simply serve up the old (stale) data. This should mean that one thread (potentially per-process) will end up doing the recalculation rather than several.

This also means that we don’t have to remove a value from the cache to force a refresh (which might cause dogpiling). Instead we can update whatever value we use in our version function, to trigger a graceful refresh.

conclusion

So that’s a basic intro to this caching decorator. It’s quite a handy quick way of adding some caching to your turbogears app. You’ll need to see how well it works for you. I’m providing it “as is” and making no claims about anything. Feel free to incorporate it into your code and modify as you see fit. Just let me know if you have any issues or feedback.

bonus decorator

The cache code also includes a simple decorator to control the Expires header sent out with a response:


def expires(seconds=0):
    '''set expire headers for client-size caching'''
    def decorator(fn):
        def decorated(*arg,**kw):
            cherrypy.response.headers['Expires']=formatdate(_current_time()+seconds)
            return fn(*arg,**kw)
        return decorated
    return decorator

It’s handy for getting the client to cache some data for us too. I use it on some of the PIL generated images served up via my app.

source code

Download turbogears caching decorator

The source for the decorator(s) includes a simple test suite (to be run using nose).

Using raw SQL with SQLObject and keeping the object-y goodness

This is sort of a continuation of my little SQLObject performance guide. So it might be worth reading that too, if you are after hints about speeding up SQLObject. Anyway, on with the show…

It’s possible to create raw (database agnostic) sql queries with SQLObject. This can be really handy for those spots where you really need to speed things up. It’s a bit like switching from Python to C for some performance intensive part of an application.

However when using raw SQL, we lose some of the nice-ness of SQLObject. Results arrive as tuples and we may then have to do more work to make use of them. So I’m going to discuss an example of using raw SQL in SQLObject, but still keeping the objects around.

The Model Code

In my example there are two model objects:


class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    views=SQLMultipleJoin('EntryView')

class EntryView(SQLObject):
    entry=ForeignKey('Entry')

Entry being a blog entry and EntryView being an object to keep track of the Entry being viewed. I’ve kept both objects free of details for this example, but obviously they could have all sorts of extra fields.

N+1 Queries

Now I want to get a list of all of the entries and how many views each entry has (sorted by number of views). So using regular SQLObject this looks like:


    # class method on the Entry class
    @classmethod
    def get_entry_views(cls):
        entries=cls.select()
        
        # get the count for each entry
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry, entry.views.count()))
        
        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts

Which is pretty straight forward really and gives the follow results (for some sample data):


[(<Entry 3 title='entry 3' body='body text 3'>, 5),
 (<Entry 1 title='hfdskhfks' body='fsdfsd'>, 2),
 (<Entry 2 title='hel' body='jjj'>, 0)]

(tuple of Entry objects followed by view count).

However this causes the following SQL to be executed:


SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (1))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (2))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (3))

Which seems a bit bad. In fact this is a classic example of the N+1 problem, where we run one initial query and then one query for each row in that result.

2 queries

So now let’s try making that a bit better, with this alternative method:


    # need to import everything from sqlobject.sqlbuilder
    @classmethod
    def get_entry_views2(cls):
        conn=cls._connection
        fields = [Entry.q.id,SQLConstant('COUNT(*)')]
        select = Select(
                        fields,
                        join=INNERJOINOn(Entry,EntryView,Entry.q.id==EntryView.q.entryID),
                        groupBy=Entry.q.id)
        sql=conn.sqlrepr(select)

        # get the counts via the raw
        # sql query
        counts={}
        for entry_id,count in conn.queryAll(sql):
            counts[entry_id]=count

        # now read in all of the entries
        # and match them with the counts
        entries=cls.select()
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry,counts.get(entry.id,0)))
        
        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts

This time I’m using a raw sql query to get all of the (non-zero) view counts in one query and then using another query to get all of the Entry objects. Then using a bit of Python I stitch the results back together and sort it.

This generates the following SQL:


SELECT entry.id, COUNT(*) FROM  entry INNER JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id
SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1

That’s not as bad as before, but if we were using regular SQL we’d be doing this in a single query that also sorted the results by the count at the same time!

1 query

At the moment we basically need the 2nd query to get the actual objects. If we could use one raw sql query to do the work for us and somehow use the results of the query to populate the relevant objects for us we’d be golden. After a bit of digging around in the SQLObject source code I looked at the get class method definition:


# in main.py
class SQLObject(object):
    ...
    def get(cls, id, connection=None, selectResults=None):

Further examination showed that if I passed in selectResults (a list of field values) in the right order I could get an object instance either based on the results I passed in, or else the version of the object with the matching id in the cache. Excellent. So now we can have a method that works thus:


    @classmethod
    def get_entry_views3(cls):
        return select_with_count(cls,EntryView,Entry.q.id==EntryView.q.entryID,orderByDesc=True)

Where the juicy bit is here (to make it more reusable elsewhere):


def select_with_count(selectClass,joinClass,join_on,orderByDesc=False):
    conn=selectClass._connection
    fields = [selectClass.q.id]
    for col in selectClass.sqlmeta.columnList:
        fields.append(getattr(selectClass.q, col.name))
    
    # name we'll assign to the count
    # so we can sort on it
    count_field=("%s_count"%joinClass.__name__).lower()
    fields.append(SQLConstant('COUNT(%s) %s'%(joinClass.q.id, count_field)))
    
    orderBy=SQLConstant(count_field)
    if orderByDesc:
        orderBy=DESC(orderBy)
    
    select=Select(
            fields, 
            join=LEFTJOINOn(selectClass,joinClass,join_on),
            groupBy=selectClass.q.id, 
            orderBy=orderBy)
    sql=conn.sqlrepr(select)
    return read_from_results(conn.queryAll(sql),selectClass)

def read_from_results(results,selectClass):
    num_columns=len(selectClass.sqlmeta.columnList)
    items=[]
    for result in results:
        id,selectResults,extra=result[0],result[1:num_columns],result[num_columns:]
        entry=selectClass.get(id,selectResults=selectResults)
        items.append((entry,)+extra)
    return items

Which returns results in the same format as the original method and only generate one SQL query:


SELECT entry.id, entry.title, entry.body, COUNT(entry_view.id) entryview_count FROM  entry LEFT JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id ORDER BY entryview_count DESC

There are a few of fiddly bits going on here that I’ll explain.

Firstly I perform a LEFT JOIN and use COUNT(entry_view.id) so we can results for entries that have no views.

Next, the order of the object fields has to match what SQLObject is expecting. That order being defined by the class’s sqlmeta.columnList.

Finally to be able to sort by the view count I have to provide a name for the count ( entryview_count), which I create based on the EntryView class name.

In conclusion

The example I gave was quite specific, but does show it’s possible to slightly better integrate raw SQL queries with SQLObject. This means that it’s possible to retain more of the easy to use nature of SQLObject when needing to speed up a few critical queries.

I suspect that with a bit of work it would be possible to create a quite nice library for performing generalised queries with SQLObject and getting nice objects back. For example it may be possible to use such techniques to eagerly load objects in joins (much as you can do in SQLAlchemy or the Java Persitence API).

A little SQLObject performance guide

For those that aren’t aware, SQLObject is an Object-Relational Mapping (ORM) library for Python. I use it in chrss (my chess by rss web app) as part of Turbogears. Ian and Kyran also use it as part of the ShowMeDo site.

Chrss and ShowMeDo have quite different levels of traffic. ShowMeDo has a lot more traffic than chrss, so performance might seem like more of an issue for ShowMeDo. However as chrss is a game that requires more interaction from the user this is not necessarily the case. If moving a piece takes even a second the site would seem sluggish. Whereas for a content rich site such as ShowMeDo user expectation can be a bit more forgiving.

Until recently Ian and Kyran have not needed to worry about performance and (rightly so) got on with the things that mattered (e.g. creating more screen-casts and building their community).

However the other day Ian asked me to help him out speed the site up. They were having some issues with a page taking too long to render. When creating chrss I’d spent a bit of extra time worrying about the performance of SQLObject, so I already knew what to look out for in their code. Luckily it mostly only required a few small tweaks and things ran a good deal quicker.

So what can you do to speed up SQLObject?

Enable Query Logging

Obviously don’t do this for your production server (it’ll only slow things down), but by adding ?debug=1 to your database connection URI, you can enable debug query logging. This will simply make SQLObject print out the details of every SQL statement that is ran against the database.

When developing this can give you a good idea of when you aren’t using SQLObject in an appropriate fashion. If you see pages of SQL statements flying past in your console window you should probably have a look to see why!

Enabling query logging is only going to help if you actually understand the SQL that you are looking at. Make sure you do some research if you aren’t familiar with SQL. SQLObject makes dealing with a relational database easier, but you still need to understand what it is actually doing to make the most of it.

SQLRelatedJoin/SQLMultipleJoin vs. RelatedJoin/MultipleJoin

Your mileage may vary, but generally speaking I’d recommend not using RelatedJoin (or MultipleJoin) to define many-to-many (or one-to-many) relationships with SQLObject. Instead use the SQL* related versions (SQLRelatedJoin and SQLMultipleJoin).

Why though?

Well RelatedJoin (and MultipleJoing) loads data lazily. Meaning that it first loads the id’s for each object, then uses a new query to load each object on demand. SQLRelatedJoin on the other hand works like select() and loads up all the data in one query. I’m simplifying a bit, but you can probably see that they behave differently.

Now sometimes lazy loading is what you want. Each object may contain a lot of data and you know you don’t need all of it.

However for the “normal” case you probably just want to get your object loaded into memory, with as few queries as possible. SQLRelatedJoin is what you want.

An example

I quick-started a project with tg-admin and created two model classes using RelatedJoin to link them:


class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    tags=RelatedJoin('Tag')

class Tag(SQLObject):
    name=StringCol(length=255,
                   alternateID=True,
                   alternateMethodName="by_tag_name")
    entries=RelatedJoin('Entry')

Pretty simple stuff. We can define an Entry and add Tag objects to it.

Then I ran tg-admin sql create to populate the (SQLite) database.

Next I ran tg-admin shell so I could create some objects in the database:


entry=Entry(title='a title',body='entry body')
test_tag=Tag(name='test_tag')
tag2=Tag(name='tag2')
entry.addTag(test_tag)
entry.addTag(tag2)

I then added ?debug=1 to the database URI:

sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite?debug=1"

Then I restarted tg-admin shell (with the IPython shell) and ran the following:


In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:     
 1/QueryAll:  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryR  :  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (1)
 1/QueryR  :  SELECT name FROM tag WHERE id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (2)
 1/QueryR  :  SELECT name FROM tag WHERE id = (2)
tag.name=test_tag
tag.name=tag2

As you can see with a RelatedJoin printing the two tags on the Entry requires the following three queries:


SELECT tag_id FROM entry_tag WHERE entry_id = (1)
SELECT name FROM tag WHERE id = (1)
SELECT name FROM tag WHERE id = (2)

(note how only the name field is queried for as this is all we use)
The RelatedJoin performs lazy-loading and ends up having to perform one query per tag! For two tags this might not be a problem, but it soon adds up if you aren’t careful.

A minor change

Simply changing RelatedJoin to SQLRelatedJoin in the models and running that same code yields:


In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:     
 1/Select  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
 1/QueryR  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
tag.name=test_tag
tag.name=tag2

Printing out the tag names for the entry now only requires one query:

SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))

This is a big improvement - the number of queries we will run now no longer depends on the number of objects being returned.

Some caveats and notes

It's not always this simple, so here are some issues you may encounter:

  • RelatedJoin returns a list, whereas SQLRelatedJoin returns a SelectResults object (the same kind of object returned when calling select())
  • Large columns (text/binary blobs) won't get lazily loaded with SQLRelatedJoin
  • Fewer database queries doesn't always mean your code will run faster - understand what each query is doing
  • Make sure you properly index your database
  • You need to understand the SQL that SQLObject generates to get the most out of SQLObject
  • SQLObject may not be as slow as you think - you might not be using it right

Turbogears and mobile mini-site logins

I’m in the process of to writing a simple mobile version of chrss (chess by rss). This is spurred by the arrival of my new Nokia 6300 (which comes with Opera Mini). I’m aiming to end up with a mini-site running at http://chrss.co.uk/mob/ or similar.

The plan is to keep it _very_ simple. Main page lets you login, then lists your active games (showing which ones it’s your turn to move etc.). From there you’ll be able to get to a game, view the board and make a move. That’ll be it. Dead simple.

Turbogears works quite well for this in some respects – I can encapsulate the mini-site as a CherryPy controller and keep it fairly self contained. Though there is one minor drawback. That being that the identity framework is set up for the main site and will attempt to use the regular login page.

So how do we re-use the whole identity framework, but force access to the mobile mini-site to go via a different login page?

After a little poking around the TG source…

The skeleton controller for logging in:


class Mobile(controllers.Controller):
    @expose(template="cheetah:chrss.templates.mobile.index")
    @mob_login
    def index(self):
        return dict()
    
    @expose(template="cheetah:chrss.templates.mobile.login")
    def login(self,user=None,pwd=None):
        if identity.current.anonymous: 
            msg=None
            visit_key = turbogears.visit.current().key
            ident=identity.current_provider.validate_identity(user,pwd,visit_key)
            if ident is None:
                msg="login failed"
                return dict(user=user,msg=msg)
        raise redirect('/mob/')

The login method looks up the current visit_key (effectively the value of the cookie TG uses to track visits) then uses identity.current_provider. validate_identity to try and log the user in.

NB: I’m using user and pwd as variables, instead of user_name and password. The identity framework intercepts parameters with those names and uses them to authenticate behind the scenes. Obviously I need those values to get through to my controller method – hence why I’m using different names.

After the call to identity.current_provider. validate_identity I check to see if I have an object returned and either report that the login failed or redirect back to /mob/. Easy.

The next bit is the decorator @mob_login shown on the index controller method. This works a bit like the @identity.require decorator, but isn’t as flexible. It simply forces the user to a mobile login page if they aren’t logged in:


def mob_login(fn):
    def decorated(*arg,**kw):
        if identity.current.anonymous:
            return dict(tg_template="cheetah:chrss.templates.mobile.login",user='',msg=None)
        return fn(*arg,**kw)
    return decorated

Obviously you’ll need a template for the login page. I’m using cheetah and my login page looks like this (there’s a master template not shown here):


#extends chrss.templates.mobile.master
#def body
#if $msg
    <div id="msg">${msg}</div>
#end if
#filter WebSafe
    <h1>chrss: chess by rss</h1>
    <form action="$tg.url('/mob/login')" method="POST">
         <label for="user">user name:</label>
         <input type="text" id="user" name="user" value="$user"/>
         <label for="pwd">password:</label>
         <input type="password" id="pwd" name="pwd"/>
         <input type="submit" value="login"/>
    </form>
#end filter
#end def

And that is more or less all I had to do to get a basic mobile/alternative login page working. Hope you found it informative. I’ll report back later on how well actually running it on a mobile device goes…

Turbogears, Breadcrumbs and get_object_trail()

Whilst reading this thread on the Turbogear’s Google Group someone (asm on the list) posted a link to adding breadcrumbs to Turbogears apps.

After a little more reading I found a page on the Turbogears site itself about this subject too: http://docs.turbogears.org/1.0/BreadCrumb

The really interesting morsel though was the discovery of the cherrypy._cputil.get_object_trail() function that’s part of CherryPy (itself a part of Turbogears). This basically gives you access to the path of objects CherryPy followed to invoke the current handler. Which basically means you can easily figure out the hierarchy of the url and generate a nice location based bread crumb trail.

So I’ve now added simple bread crumb navigation to chrss (my chess by rss site). That should go live in the next few days. I’ll need to tweak it a bit though, as I’m actually subverting CherryPy in a few places, but it’s a nice addition as it stands.