Django Batch Select

November 23rd, 2009

So quite a while ago I wrote about avoiding the n+1 query problem with SQLObject. I’ve since been using Django a lot more. In particular at my day job I’ve been improving a Django site we’ve inherited. This particular site suffered from a few too many SQL queries and needed speeding up. Some of the issues related to the classic n+1 problem. i.e. one initial query triggering a further n queries (where n is the number of results in the first query).

A liberal dose of select_related helped. However that was only useful in the cases where a ForeignKey needed to be pre-fetched.

In the case however there was a page that was selecting a set of objects that had tags. The tags for each object were being displayed along side a link to the main object. Given that the initial query returned over three hundred objects, this meant the page was performing another three hundred (plus) queries to fetch the individual tags for each object! Now we could cache the page and that’s was indeed what was being done. The trouble however was when the cache expired. It also made things painful when developing – as I’d typically want to disable caching whilst I’m making changes to pages frequently.

I came up with a specific solution for this project – to perform the initial query, then a second query to fetch *all* of the other tags in one go. The results of the second query could then be stitched into the original results, to make them available in a handy manor within the page’s template.

I took the original idea and made it re-usable and am now releasing that code as Django Batch Select. There are examples of how to use it over at github, but it works a bit like this:


>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[<Tag: Tag object>]

It’s a very early release – after the result of only a few hours coding, so use with care. It does have 100% test code coverage at the moment and I’m reasonable confident about it working. Please try it out and let me know whether it works for you.

An iPhone friendly, local storage backed, offline TODO list webapp

July 10th, 2009

A while back I had a go at using the local storage features being added to javascript to create a simple TODO list app. The main focus of the app was getting it all to fit under 5K (5120 bytes), but it was a good test of using a client-side sql database in javascript.

As the app was written for size it didn’t have many frills. It did however work on the iPhone, as it’s version of Safari had support for the openDatabase call needed. However it didn’t look so good and although the TODO list items were stored locally the phone still had to be online to access the host webpage – which kind of undermined some of it’s utility.

With my recent acquisition of an iPhone I thought I’d revisit this TODO app and make it play nicely on the iPhone. In addition to using a client-side SQL database this new version features:

These features mean that if you have add a bookmark to your homescreen for this app, you might as well be running a native app. It looks pretty native, stores data locally, doesn’t require a net connection and even features standard app UI mechanisms. The main giveaway is of course the chrome associated with the Safari browser. Still not bad for some html, css and javascript. Not a total replacement for native Cocoa apps, but it does put the creation of client-side apps for the iPhone into the hands of even more people.

If you are on an iPhone or are running Safari 4.0 you can try out jTODO yourself or watch the video of it in action below:



I won’t rehash the sql-side of things in this post – instead I’d refer you to my original post on the matter.

The use of iui was also fairly straightforward – I’m only using the style-sheets and images. This means that there are no animations involved, but at this stage it seemed excessive to add them in. Perhaps I’ll add some in a future version.

This leaves the offline application cache and getting drag and drop to work.

Offline application cache

The offline application cache is really simple to implement. It’s currently supported by Safari on the iPhone, Safari 4.0 and Firefox 3+.

In my case I want all files to be cached, so I simple specify them in a “manifest” file:


CACHE MANIFEST

# version 1.0.2.37

iui/backButton.png
iui/blueButton.png
iui/cancel.png
iui/grayButton.png
iui/iui-logo-touch-icon.png
iui/iui.css
iui/iui.js
iui/iuix.css
iui/iuix.js
iui/listArrow.png
iui/listArrowSel.png
iui/listGroup.png
iui/loading.gif
iui/pinstripes.png
iui/selection.png
iui/thumb.png
iui/toggle.png
iui/toggleOn.png
iui/toolButton.png
iui/toolbar.png
iui/whiteButton.png

css/todo.css

js/jquery-1.3.2.min.js
js/jquery-ui-1.7.2.custom.min.js
js/todo.js

img/delete.png
img/deleting.png
img/redButton.png
img/handle.png
img/todo-touch-icon.png

This manifest file is then linked to the main html file thus:


<html manifest="todo.manifest">

That’s it. We now have an offline capable web-app. The main issue I had when doing this, was that Safari was very strict about the manifest – any file not mentioned in the manifest would not be loaded (even if it was normally accessible). The other issue of course is that we’ve now introduced another level of caching, so developing can be a bit annoying – as you think you’ve made a change, but then nothing shows up. I often ended up disabling the manifest for a while when debugging and then re-enabled it after things were working again. There’s also a version number in the manifest file, so that it well register as changed – to help refresh the caches after we’ve made a change.

With all this in place the app can be used when no net connection is available (e.g. in Airplane mode).

Drag and drop

The last job when developing this app was to enable drag and drop for re-ordering items. My first go at this worked pretty easily using jQuery UI’s sortable plugin – when running on my mac in Safari:


            page.sortable({
                axis: 'y',
                handle: '.handle',
                update: function(){
                    db.transaction(function(tx) {
                        $('.todo_item').each(function(i,item) {
                            var id = $(item).find(':input[type=checkbox]').attr('id');
                            id = Number(id.substring('todo_checkbox_'.length));
                            tx.executeSql('UPDATE todo SET todo_order=? WHERE id=?', [i, id]);
                        });
                    });
                }
            });

Essentially this is just setup to enabling drag and drop sorting only along the y axis (up and down), using the element with class handle to start the drag. Once the dragging has finished update gets called and I inspect the DOM to work out the current order of the todo_items and update the database accordingly.

That was pretty easy and working really well on the mac. Then of course I thought I’d test it on the iPhone. At that point I realised I’d forgotten that drag and drop doesn’t normally work in Safari in the iPhone. Holding and dragging normally scrolls the entire page – so drag and drop was a bit useless here.

However after a little digging I found someone had figured out how to get drag and drop working on the iPhone, by hijacking the various “touch” events that the phone generates. These work a little differently to the normal mouse events, but with some work can made to do our bidding:


    function handleTouchEvent(event) {
        /* from http://jasonkuhn.net/mobile/jqui/js/jquery.iphoneui.js
         but changed a bit*/
        
        var touches = event.changedTouches;
        var first = touches[0];
        var type = '';
        
        
        // only want to do this for the drag handles
        if ( !first.target || !first.target.className || first.target.className.indexOf("handle") == -1 ) {
            return;
        }
        
        switch(event.type) {
            case 'touchstart':
                type = 'mousedown';
                break;
                
            case 'touchmove':
                type = 'mousemove';
                break;        
                
            case 'touchend':
                type = 'mouseup';
                break;
            
            default:
                return;
        }
        
        var simulatedEvent = document.createEvent('MouseEvent');
        simulatedEvent.initMouseEvent(type, true, true, window, 1, first.screenX, first.screenY, first.clientX, first.clientY, false, false, false, false, 0/*left*/, null);
        
        first.target.dispatchEvent(simulatedEvent);
        
        if ( event.type == 'touchmove' ) {
            event.preventDefault();
        }
    }
    document.addEventListener("touchstart", handleTouchEvent, false);
    document.addEventListener("touchmove", handleTouchEvent, false);
    document.addEventListener("touchend", handleTouchEvent, false);

This registers listeners for the touch events, but only does any extra work if the target of the event has the class “handle”. If that’s the case a simulated mouse event is sent for the first touched item. To stop the page from scrolling when we want to drag instead event.preventDefault() is called just for the touchmove event. This is sufficient to let jquery UI do it’s job and enable drag and drop sorting of the TODO items.

Client-side sql databases in Javascript

January 13th, 2009

It occurred to me that I didn’t really go into much detail about the local storage part of my 5K TODO app. It’s quite a neat piece of tech and because it works in Safari it also works on the iPhone. That obviously means you can use javascript+local storage to create apps for your iPhone without needing to touch the iPhone SDK or learn objective-c. As everything is client-side you also don’t need server-side code to store your app state. The only downside is that local storage is at the mercy of the user. If they decide to reset Safari or generally clear out their user data the local storage goes too! Then again this would be true of files on a Mac’s file-system anyway…

So for any “serious” stuff you might want to look elsewhere, but in the meantime you can still build interesting client-side apps that use local-storage and run on your iPhone.

At the time of writing database storage in html5 is still a work in progress and is only available in Safari/Webkit. In the future it should appear in Firefox etc. Though there’s a HTML5 Wrapper for Google Gears that uses Gears to provide the html5 local storage functions (e.g. openDatabase).

Opening the database

The main function we’re concerned with is openDatabase which let’s us open/create a database that can only be accessed by the current domain (in a similar manner to cookies). In the TODO app I use this code to see if openDatabase is available and output a message if not and stop any further processing:


    if ( !window.openDatabase ) {
        $('add_note').innerHTML = "Browser does not support local storage."
        return;
    }

(note $ is a shortcut to document.getElementById in the TODO app code)

Assuming we find the openDatabase function we can then get a database object thus:


        var dbName = '5KTodo',
             db = openDatabase(dbName, '1.0' /*version*/, dbName, 65536 /*max size*/);

In this case we create/open a database called 5KTodo. We provide a version number (not really used at this stage, but is intended for allowing upgrading database schemas) and a max size. The max size is provided as a hint to the browser as to how big the database will grow. The upper limit for this is defined by the browser (and is about 5M in Safari).

Now once we’ve got our database object we can start running some SQL. We do this using the database object’s transaction method and the transaction’s (tx) executeSql function:


    db.transaction(
        function(tx) {
            tx.executeSql(sql, params, callback);
        }
    );

That will run the sql using the given params (replacing ? in the sql string with the param values to avoid sql-injection attacks) and then calls the callback with the results (if it succeeded).

Creating the tables

The first time we run the app the database will not exist and will contains no database tables, so we need to run some SQL to create the tables:


    function dbTransaction(fn) {
        db.transaction(fn);
    }
    
    /*create todo table*/
    dbTransaction(
        function (tx) {
            tx.executeSql('CREATE TABLE todo (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, description TEXT NOT NULL DEFAULT "", todo_order INTEGER NOT NULL DEFAULT 0, done INTEGER NOT NULL DEFAULT 0 );');
        }
    );

On the next run of the application that will fail silently, which is fine as it only needs to run successfully once (to create the tables). It’s key that the create table statement runs in it’s own transaction though, so when it fails it doesn’t stop anything else running. Note that dbTransaction used here is only so that YUICompressor can better compress the code, as this code is from an example for the 5K App.

Reading from the database

After creating the tables (or doing nothing if they are already there) the next step is to see what’s already in the database. Unlike the create table statement we expect to get some data back, so we provide a callback to get the results:


    dbTransaction(
        function (tx) {
            listTODOs(tx, 'SELECT * FROM todo ORDER BY todo_order ASC', []);
        }
    );
    
    /*run the sql to read the notes and insert them into the DOM*/
    function listTODOs(tx, sql, params) {
        tx.executeSql(
            sql, params,
            function(tx, results) {
                var rows = results.rows;
                for ( var i = 0; i < rows.length; i++ ) {
                    var row = rows.item(i),
                        li = createElement('li'),
                        checkbox = createElement('input'),
                        label = createElement('label');
                    setAttribute(checkbox, 'type','checkbox');
                    setAttribute(checkbox, 'id', 'checkbox_'+row['id']);
                    setAttribute(label, 'for', 'checkbox_'+row['id']);
                    appendChild(label, createTextNode(row['description']));
                    appendChild(li,checkbox);
                    appendChild(li,label);
                    setAttribute(li,'id', 'note_'+row['id']);
                    
                    if ( !row['done'] ) {
                        // not done, so insert above text field
                        notesList.insertBefore(li,$('add_note'));
                    }
                    else {
                        // done so put below text field
                        setAttribute(checkbox,'checked','checked');
                        li.className = 'done';
                        appendChild(notesList, li);
                    }
                    
                    attachTODOHandlers(row['id'],li,checkbox);
                }
            });
    }

(note there are a lot of shortcut functions in there, but hopefully you should get the idea)

What that basically does is execute 'SELECT * FROM todo ORDER BY todo_order ASC' on the database in a transaction (tx) and then takes the results and inserts them into the DOM.

Inserting into the database

Inserting into the database is pretty straightforward - we just use an SQL insert statement. Though in the case of the 5K TODO app we need to do a couple of other things. The first is figuring out the next highest todo_order, then we insert the TODO item and next we re-select the TODO items from the database (with the id of the newly inserted item - results.insertId). The 2nd part is a bit overly obtuse in some respects, but was done so as to help cut down on code size (to fit under the 5K limit):


    function insertTODO(description) {
        dbTransaction(
            function(tx) {
                // set order to be one after items that aren't done
                tx.executeSql('SELECT MAX(todo_order) AS next_order FROM todo WHERE done=0', [],
                    function(tx,results) {
                        var order = results.rows.item(0)['next_order'] + 1;
                        tx.executeSql('INSERT INTO todo (description,todo_order) VALUES(?,?)', [description,order], 
                            function(tx,results) { 
                                listTODOs(tx, 'SELECT * FROM todo WHERE id=?', [results.insertId]);
                                updateTODOOrder(tx);
                            }
                        );
                    }
                );
            }
        );
    }

As the later SQL statements depend on results from the earlier statement(s) they actually get run in the statement callback functions.

It's fairly straightforward to access a database this way. For those familiar with databases it should all be pretty familiar territory. There's plenty of more info on Apple's site about using the Javascript database. You may also want to look at the source code the the 5K TODO app, though be warned a little bit of readability has been sacrificed for code size.

Using raw SQL with SQLObject and keeping the object-y goodness

December 16th, 2007

This is sort of a continuation of my little SQLObject performance guide. So it might be worth reading that too, if you are after hints about speeding up SQLObject. Anyway, on with the show…

It’s possible to create raw (database agnostic) sql queries with SQLObject. This can be really handy for those spots where you really need to speed things up. It’s a bit like switching from Python to C for some performance intensive part of an application.

However when using raw SQL, we lose some of the nice-ness of SQLObject. Results arrive as tuples and we may then have to do more work to make use of them. So I’m going to discuss an example of using raw SQL in SQLObject, but still keeping the objects around.

The Model Code

In my example there are two model objects:


class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    views=SQLMultipleJoin('EntryView')

class EntryView(SQLObject):
    entry=ForeignKey('Entry')

Entry being a blog entry and EntryView being an object to keep track of the Entry being viewed. I’ve kept both objects free of details for this example, but obviously they could have all sorts of extra fields.

N+1 Queries

Now I want to get a list of all of the entries and how many views each entry has (sorted by number of views). So using regular SQLObject this looks like:


    # class method on the Entry class
    @classmethod
    def get_entry_views(cls):
        entries=cls.select()
        
        # get the count for each entry
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry, entry.views.count()))
        
        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts

Which is pretty straight forward really and gives the follow results (for some sample data):


[(<Entry 3 title='entry 3' body='body text 3'>, 5),
 (<Entry 1 title='hfdskhfks' body='fsdfsd'>, 2),
 (<Entry 2 title='hel' body='jjj'>, 0)]

(tuple of Entry objects followed by view count).

However this causes the following SQL to be executed:


SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (1))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (2))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (3))

Which seems a bit bad. In fact this is a classic example of the N+1 problem, where we run one initial query and then one query for each row in that result.

2 queries

So now let’s try making that a bit better, with this alternative method:


    # need to import everything from sqlobject.sqlbuilder
    @classmethod
    def get_entry_views2(cls):
        conn=cls._connection
        fields = [Entry.q.id,SQLConstant('COUNT(*)')]
        select = Select(
                        fields,
                        join=INNERJOINOn(Entry,EntryView,Entry.q.id==EntryView.q.entryID),
                        groupBy=Entry.q.id)
        sql=conn.sqlrepr(select)

        # get the counts via the raw
        # sql query
        counts={}
        for entry_id,count in conn.queryAll(sql):
            counts[entry_id]=count

        # now read in all of the entries
        # and match them with the counts
        entries=cls.select()
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry,counts.get(entry.id,0)))
        
        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts

This time I’m using a raw sql query to get all of the (non-zero) view counts in one query and then using another query to get all of the Entry objects. Then using a bit of Python I stitch the results back together and sort it.

This generates the following SQL:


SELECT entry.id, COUNT(*) FROM  entry INNER JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id
SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1

That’s not as bad as before, but if we were using regular SQL we’d be doing this in a single query that also sorted the results by the count at the same time!

1 query

At the moment we basically need the 2nd query to get the actual objects. If we could use one raw sql query to do the work for us and somehow use the results of the query to populate the relevant objects for us we’d be golden. After a bit of digging around in the SQLObject source code I looked at the get class method definition:


# in main.py
class SQLObject(object):
    ...
    def get(cls, id, connection=None, selectResults=None):

Further examination showed that if I passed in selectResults (a list of field values) in the right order I could get an object instance either based on the results I passed in, or else the version of the object with the matching id in the cache. Excellent. So now we can have a method that works thus:


    @classmethod
    def get_entry_views3(cls):
        return select_with_count(cls,EntryView,Entry.q.id==EntryView.q.entryID,orderByDesc=True)

Where the juicy bit is here (to make it more reusable elsewhere):


def select_with_count(selectClass,joinClass,join_on,orderByDesc=False):
    conn=selectClass._connection
    fields = [selectClass.q.id]
    for col in selectClass.sqlmeta.columnList:
        fields.append(getattr(selectClass.q, col.name))
    
    # name we'll assign to the count
    # so we can sort on it
    count_field=("%s_count"%joinClass.__name__).lower()
    fields.append(SQLConstant('COUNT(%s) %s'%(joinClass.q.id, count_field)))
    
    orderBy=SQLConstant(count_field)
    if orderByDesc:
        orderBy=DESC(orderBy)
    
    select=Select(
            fields, 
            join=LEFTJOINOn(selectClass,joinClass,join_on),
            groupBy=selectClass.q.id, 
            orderBy=orderBy)
    sql=conn.sqlrepr(select)
    return read_from_results(conn.queryAll(sql),selectClass)

def read_from_results(results,selectClass):
    num_columns=len(selectClass.sqlmeta.columnList)
    items=[]
    for result in results:
        id,selectResults,extra=result[0],result[1:num_columns],result[num_columns:]
        entry=selectClass.get(id,selectResults=selectResults)
        items.append((entry,)+extra)
    return items

Which returns results in the same format as the original method and only generate one SQL query:


SELECT entry.id, entry.title, entry.body, COUNT(entry_view.id) entryview_count FROM  entry LEFT JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id ORDER BY entryview_count DESC

There are a few of fiddly bits going on here that I’ll explain.

Firstly I perform a LEFT JOIN and use COUNT(entry_view.id) so we can results for entries that have no views.

Next, the order of the object fields has to match what SQLObject is expecting. That order being defined by the class’s sqlmeta.columnList.

Finally to be able to sort by the view count I have to provide a name for the count ( entryview_count), which I create based on the EntryView class name.

In conclusion

The example I gave was quite specific, but does show it’s possible to slightly better integrate raw SQL queries with SQLObject. This means that it’s possible to retain more of the easy to use nature of SQLObject when needing to speed up a few critical queries.

I suspect that with a bit of work it would be possible to create a quite nice library for performing generalised queries with SQLObject and getting nice objects back. For example it may be possible to use such techniques to eagerly load objects in joins (much as you can do in SQLAlchemy or the Java Persitence API).

A little SQLObject performance guide

October 27th, 2007

For those that aren’t aware, SQLObject is an Object-Relational Mapping (ORM) library for Python. I use it in chrss (my chess by rss web app) as part of Turbogears. Ian and Kyran also use it as part of the ShowMeDo site.

Chrss and ShowMeDo have quite different levels of traffic. ShowMeDo has a lot more traffic than chrss, so performance might seem like more of an issue for ShowMeDo. However as chrss is a game that requires more interaction from the user this is not necessarily the case. If moving a piece takes even a second the site would seem sluggish. Whereas for a content rich site such as ShowMeDo user expectation can be a bit more forgiving.

Until recently Ian and Kyran have not needed to worry about performance and (rightly so) got on with the things that mattered (e.g. creating more screen-casts and building their community).

However the other day Ian asked me to help him out speed the site up. They were having some issues with a page taking too long to render. When creating chrss I’d spent a bit of extra time worrying about the performance of SQLObject, so I already knew what to look out for in their code. Luckily it mostly only required a few small tweaks and things ran a good deal quicker.

So what can you do to speed up SQLObject?

Enable Query Logging

Obviously don’t do this for your production server (it’ll only slow things down), but by adding ?debug=1 to your database connection URI, you can enable debug query logging. This will simply make SQLObject print out the details of every SQL statement that is ran against the database.

When developing this can give you a good idea of when you aren’t using SQLObject in an appropriate fashion. If you see pages of SQL statements flying past in your console window you should probably have a look to see why!

Enabling query logging is only going to help if you actually understand the SQL that you are looking at. Make sure you do some research if you aren’t familiar with SQL. SQLObject makes dealing with a relational database easier, but you still need to understand what it is actually doing to make the most of it.

SQLRelatedJoin/SQLMultipleJoin vs. RelatedJoin/MultipleJoin

Your mileage may vary, but generally speaking I’d recommend not using RelatedJoin (or MultipleJoin) to define many-to-many (or one-to-many) relationships with SQLObject. Instead use the SQL* related versions (SQLRelatedJoin and SQLMultipleJoin).

Why though?

Well RelatedJoin (and MultipleJoing) loads data lazily. Meaning that it first loads the id’s for each object, then uses a new query to load each object on demand. SQLRelatedJoin on the other hand works like select() and loads up all the data in one query. I’m simplifying a bit, but you can probably see that they behave differently.

Now sometimes lazy loading is what you want. Each object may contain a lot of data and you know you don’t need all of it.

However for the “normal” case you probably just want to get your object loaded into memory, with as few queries as possible. SQLRelatedJoin is what you want.

An example

I quick-started a project with tg-admin and created two model classes using RelatedJoin to link them:


class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    tags=RelatedJoin('Tag')

class Tag(SQLObject):
    name=StringCol(length=255,
                   alternateID=True,
                   alternateMethodName="by_tag_name")
    entries=RelatedJoin('Entry')

Pretty simple stuff. We can define an Entry and add Tag objects to it.

Then I ran tg-admin sql create to populate the (SQLite) database.

Next I ran tg-admin shell so I could create some objects in the database:


entry=Entry(title='a title',body='entry body')
test_tag=Tag(name='test_tag')
tag2=Tag(name='tag2')
entry.addTag(test_tag)
entry.addTag(tag2)

I then added ?debug=1 to the database URI:

sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite?debug=1"

Then I restarted tg-admin shell (with the IPython shell) and ran the following:


In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:     
 1/QueryAll:  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryR  :  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (1)
 1/QueryR  :  SELECT name FROM tag WHERE id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (2)
 1/QueryR  :  SELECT name FROM tag WHERE id = (2)
tag.name=test_tag
tag.name=tag2

As you can see with a RelatedJoin printing the two tags on the Entry requires the following three queries:


SELECT tag_id FROM entry_tag WHERE entry_id = (1)
SELECT name FROM tag WHERE id = (1)
SELECT name FROM tag WHERE id = (2)

(note how only the name field is queried for as this is all we use)
The RelatedJoin performs lazy-loading and ends up having to perform one query per tag! For two tags this might not be a problem, but it soon adds up if you aren’t careful.

A minor change

Simply changing RelatedJoin to SQLRelatedJoin in the models and running that same code yields:


In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:     
 1/Select  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
 1/QueryR  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
tag.name=test_tag
tag.name=tag2

Printing out the tag names for the entry now only requires one query:

SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))

This is a big improvement - the number of queries we will run now no longer depends on the number of objects being returned.

Some caveats and notes

It's not always this simple, so here are some issues you may encounter:

  • RelatedJoin returns a list, whereas SQLRelatedJoin returns a SelectResults object (the same kind of object returned when calling select())
  • Large columns (text/binary blobs) won't get lazily loaded with SQLRelatedJoin
  • Fewer database queries doesn't always mean your code will run faster - understand what each query is doing
  • Make sure you properly index your database
  • You need to understand the SQL that SQLObject generates to get the most out of SQLObject
  • SQLObject may not be as slow as you think - you might not be using it right

chrss update 6

March 30th, 2007

So I’ve now uploaded the results of the saturday morning code-a-thon. This means that you can now browse all of the previous moves and they also have a convenient URL. So if you want to show someone a move that you thought was particularly good/bad you can just send them the link. e.g. in this pretty amazing (fun at least) game against Kennon, I was particularly proud of this move:

http://psychicorigami.com/chrss/game/9/browse/29

Pretty happy with what I’ve got implemented for chrss now. This features take it slightly past the core ability to “just play chess”. Hopefully I can start leveraging the fact that everything is being recorded in the database and is now “just data” that can be manipulated and viewed in different ways. Though I think I might spend a bit of time on some of the more mundane aspects first (like being able to reset your password).

On a more technical note, starting to feel a lot more like I really know my SQL now. Been proficient at it for a while, but I was quite proud of this:


alter table move add column move_num int;

-- temp table for calculating the move numbers
create table tmp_move (moveid int,movenum int);
-- insert move numbers into tmp table
insert into tmp_move (moveid,movenum) select id as moveid, (select count(*) from move m2 where m2.id <= m.id and m2.game_id = m.game_id) as movenum from move m;
-- then update moves from tmp table
update move set move.move_num = (select movenum from tmp_move where tmp_move.moveid = move.id);
-- remove tmp table
drop table tmp_move;

-- increase constraints on the move_num values
alter table move modify column move_num int not null;

alter table move add unique game_move_num_index (game_id, move_num);
alter table move add index move_num_index (move_num);

Which basically adds a “move_num” column to the move table and the calculates the correct value for each move (i.e. whether a move is move 9, 10 etc. in a particular game). Previously I was calculating this on the fly, but that was forcing me to always read all of the moves for a game. This should make things a bit simpler for me down the line. Just a shame I couldn’t do it without a temporary table.