<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Psychic Origami &#187; sqlobject</title>
	<atom:link href="http://www.psychicorigami.com/category/sqlobject/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.psychicorigami.com</link>
	<description>folding with my brain</description>
	<lastBuildDate>Wed, 03 Aug 2011 19:13:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Turbogears, remember me</title>
		<link>http://www.psychicorigami.com/2008/09/22/turbogears-remember-me/</link>
		<comments>http://www.psychicorigami.com/2008/09/22/turbogears-remember-me/#comments</comments>
		<pubDate>Mon, 22 Sep 2008 19:09:03 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Chrss]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[sqlobject]]></category>
		<category><![CDATA[turbogears]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/?p=150</guid>
		<description><![CDATA[So a while back I implemented a remember me feature for chrss. I said I&#8217;d release the code for it and am finally now getting round to it. Please note that this kind of &#8220;remember me&#8221; functionality can represent a potentially security hole. It makes sense for some sites where the convenience out weighs any [...]]]></description>
			<content:encoded><![CDATA[<p>So a while back I implemented a <a href='http://chrss.co.uk/blog/remember-me'>remember me</a> feature for <a href='http://chrss.co.uk/'>chrss</a>.  I said I&#8217;d release the code for it and am finally now getting round to it.</p>
<p>Please note that this kind of &#8220;remember me&#8221; functionality can represent a potentially security hole.  It makes sense for some sites where the convenience out weighs any problems that would occur if someone fraudulently gains access to the site.  As I wrote this for a site that is concerned with playing chess online it seemed worth it.</p>
<p>So to get started this is meant to work with:</p>
<ul>
<li><a href='http://docs.turbogears.org/1.0/'>Turbogears 1.0</a></li>
<li><a href='http://www.sqlobject.org/'>SQLObject</a> (though it should be easy to adapt to <a href='http://www.sqlalchemy.org/'>SQLAlchemy</a>)</li>
</ul>
<p>Also note that I&#8217;ve left some of the imports as they appear for my app (chrss), so you&#8217;ll need to change them as appropriate.</p>
<h3>The idea</h3>
<p>Conceptually a regular request with a remember me feature works thus:</p>
<ul>
<li>If the user is not logged in, we check for a &#8220;remember me&#8221; cookie</li>
<li>If the cookie is present then we check to see if it matches a token (which maps to a user) in the database</li>
<li>If there&#8217;s a match to a user we can login the user and on future requests we can ignore the remember me cookie (everything works as before)</li>
</ul>
<p>The token in the database is randomly generated when the user logs in (with the &#8220;remember me&#8221; option ticked on the login form) in a similar way to any kind of session tracking cookie.  The different is that the token/cookie is meant to hang around for much longer than a regular session. It&#8217;s used in addition to Turbogears <code>tg-visit</code> cookie and is just a handy shortcut for logging in a user automatically.  This means that it&#8217;s fairly non-invasive in so far as it interacts with the <a href='http://docs.turbogears.org/1.0/Identity'>Turbogears identity framework</a>.</p>
<h3>The code</h3>
<p>First of all we need a table in the database to connect the remember me token to a user.  So in my models I defined the following entity:</p>
<pre><code>
class RememberMe(SQLObject):
    user_token = StringCol(length=40, alternateID=True,
            alternateMethodName="by_user_token")
    user_id = IntCol()
    expiry = DateTimeCol()

    expiry_index=DatabaseIndex(expiry)
</code></pre>
<p>The rest of the code then lives in <code>remember_me.py</code>.</p>
<p>First there&#8217;s the code to &#8220;remember&#8221; a user.  This creates a <code>RememberMe</code> entity and  sets a cookie on the user&#8217;s machine:</p>
<pre><code>
def generate_token():
    key_string= '%s%s%s%s' % (random.random(), datetime.now(),
                              cherrypy.request.remote_host,
                              cherrypy.request.remote_port)
    return sha.new(key_string).hexdigest()

def remember_user(user):
    from chrss.model import RememberMe

    user_token=generate_token()
    expiry=datetime.now() + timedelta(days=remember_me_age_days)
    remember=RememberMe(user_token=user_token, user_id=user.id,expiry=expiry)

    cookies= cherrypy.response.simple_cookie
    max_age = remember_me_age_days*24*60*60
    cookies[remember_cookie_name] = remember.user_token
    cookies[remember_cookie_name]['path'] = '/'
    cookies[remember_cookie_name]['expires'] = formatdate(time() + max_age)
    cookies[remember_cookie_name]['max-age'] = max_age
</code></pre>
<p>Here&#8217;s the reverse function  to &#8220;un-remember&#8221; a user (which you would call from your logout method):</p>
<pre><code>
def unremember_user(user):
    cookies = cherrypy.request.simple_cookie
    if remember_cookie_name in cookies:
        user_token=cookies[remember_cookie_name].value

        if user_token:
            from chrss.model import RememberMe
            try:
                remember=RememberMe.by_user_token(user_token)
                remember.destroySelf()
            except SQLObjectNotFound:
                pass

            # now clear cookie
            cookies= cherrypy.response.simple_cookie
            cookies[remember_cookie_name] = ''
            cookies[remember_cookie_name]['path'] = '/'
            cookies[remember_cookie_name]['expires'] = 0
            cookies[remember_cookie_name]['max-age'] = 0
</code></pre>
<p>Before I get onto the two monkey patches, we need to make one more function, that we use to login the user given a user entity (bypassing the need for their username and password) and is based on code from <a href='http://docs.turbogears.org/1.0/IdentityRecipes'>here</a>:</p>
<pre><code>
def login_user(user):
    ''' from http://docs.turbogears.org/1.0/IdentityRecipes'''
    visit_key = turbogears.visit.current().key
    IdentityObject = turbogears.identity.soprovider.SqlObjectIdentity

    from chrss.model import VisitIdentity
    try:
        link = VisitIdentity.by_visit_key(visit_key)
    except SQLObjectNotFound:
        link = None
    if not link:
        link = VisitIdentity(visit_key=visit_key, user_id=user.id)
    else:
        link.user_id = user.id
    user_identity = IdentityObject(visit_key);
    return user_identity
</code></pre>
<h3>The monkey patches</h3>
<p>Now we get to the meat of the code &#8211; the bit which does the actual &#8220;magic&#8221;.  In both cases we are monkey-patching methods that belong to the <code>IdentityVisitPlugin</code> class in Turbogears (defined in <code>turbogears.identity.visitor</code>).</p>
<p>First up is <code>identity_from_visit</code> which normally just looks for the <code>tg-visit</code> cookie and then sees if that&#8217;s associated with a user login or not.  We shall effectively override it, so that if no association is found then we will perform a further check to see if there is a remember me cookie that will let us log the user in:</p>
<pre><code>
# keep a reference to the original function
old_identity_from_visit=turbogears.identity.visitor.IdentityVisitPlugin.identity_from_visit

def identity_from_remember_me( self, visit_key ):
    identity=old_identity_from_visit( self, visit_key )
    if identity.anonymous:
        # not logged in so check for remember me cookie
        cookies = cherrypy.request.simple_cookie
        if remember_cookie_name in cookies:
            log.info("checking remember me cookie")
            user_token=cookies[remember_cookie_name].value

            from chrss.model import RememberMe, User
            try:
                remember=RememberMe.by_user_token(user_token)
                user=User.get(remember.user_id)
                return login_user(user)
            except SQLObjectNotFound:
                pass

    return identity

# monkey-patch the method
turbogears.identity.visitor.IdentityVisitPlugin.identity_from_visit=identity_from_remember_me
</code></pre>
<p>The next method we patch is <code>identity_from_form</code>.  For this we just check whether there is a &#8220;remember_me&#8221; parameter in the request after a successful login (from calling the original method) and if so call the <code>remember_user()</code> function.</p>
<pre><code>
old_identity_from_form=turbogears.identity.visitor.IdentityVisitPlugin.identity_from_form

def identity_from_form(self, visit_key):
    identity=old_identity_from_form(self, visit_key)
    if identity is not None and not identity.anonymous:
        # login worked, so now see if 'remember me' set
        params=cherrypy.request.params
        remember_me=params.pop('remember_me', None)
        if remember_me:
            remember_user(identity.user)
    return identity

turbogears.identity.visitor.IdentityVisitPlugin.identity_from_form=identity_from_form
</code></pre>
<p>You&#8217;ll just import the <code>remember_me</code> module early on in starting up your Turbogears app and it will apply these monkey patches.  Then if you modify your login template to include a &#8220;remember_me&#8221; checkbox you should have everything working.</p>
<p>As I said before it&#8217;s fairly non-invasive (as far as monkey patches go), so there shouldn&#8217;t really be a need to modify much beyond your login form and to add a call to <code>unremember_user</code> to your logout code.  The only other thing is perhaps to setup a cron-script or other background task to delete expired entries in the database (which is why the <code>RememberMe</code> entity has an <code>expiry</code> column).</p>
<h3>Source code</h3>
<p>The <code>remember_me</code> module is available for download <a href='http://psychicorigami.com/source/remember_me.tar.gz'>here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2008/09/22/turbogears-remember-me/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Splitting your Turbogears SQLObject models</title>
		<link>http://www.psychicorigami.com/2008/07/21/splitting-your-turbogears-sqlobject-models/</link>
		<comments>http://www.psychicorigami.com/2008/07/21/splitting-your-turbogears-sqlobject-models/#comments</comments>
		<pubDate>Mon, 21 Jul 2008 18:38:43 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[sqlobject]]></category>
		<category><![CDATA[turbogears]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/?p=147</guid>
		<description><![CDATA[Just a quick note about splitting your model.py file in Turbogears 1.0, when using SQLObject. The Turbogears docs have some notes on this, but there was an extra trick to it in the end. The model.py file for chrss, was starting to get a bit big, so it seemed like a good time to do [...]]]></description>
			<content:encoded><![CDATA[<p>Just a quick note about splitting your <code>model.py</code> file in <a href='http://turbogears.org/'>Turbogears</a> 1.0, when using <a href='http://sqlobject.org/'>SQLObject</a>.  The Turbogears docs have <a href='http://docs.turbogears.org/1.0/CreatingBigApplications#splitting-the-model'>some notes</a> on this, but there was an extra trick to it in the end.</p>
<p>The <code>model.py</code> file for <a href='http://chrss.co.uk/'>chrss</a>, was starting to get a bit big, so it seemed like a good time to do this.</p>
<p>First I moved <code>model.py</code> into <code>model/__init__.py</code>.  Then I moved all of the model code itself into separate files (three as it happens) and imported them into <code>model/__init__.py</code> as indicated in the Turbogears docs:</p>
<pre><code>
from chrss.model.cms import *
from chrss.model.chess import *
from chrss.model.base import *
</code></pre>
<p>However that wasn&#8217;t enough, as the <code>__connection__</code> module level variable for SQLObject wasn&#8217;t set and Turbogears couldn&#8217;t connect to the DB.  So I added this to <code>model/__init__.py</code> (<em>before the other imports</em>):</p>
<pre><code>
from turbogears.database import PackageHub

hub = PackageHub("chrss")
</code></pre>
<p>and then in each file containing models added the following:</p>
<pre><code>
from chrss.model import hub
__connection__ = hub
</code></pre>
<p>The main trick was to get the import order correct.  <code>model/__init__.py</code> must declare the <code>hub</code> variable, before importing the other files, so that they can access it when they are imported.  It&#8217;s a bit of a cyclical dependency, which is maybe not ideal, but it&#8217;s only used in a limited way.</p>
<p><strong>UPDATE</strong>.  It turns out that you also need to update the <code>sqlobject.txt</code> file in the <code>.egg-info</code> directory of your project.  Otherwise the various <code>tg-admin sql *</code> commands don&#8217;t work (as it can&#8217;t find the SQLObject classes).  Basically you have to list every sub-package of the newly split model package. e.g. change:</p>
<pre><code>
db_module=chrss.model
</code></pre>
<p>to:</p>
<pre><code>
db_module=chrss.model,chrss.model.base,chrss.model.chess,chrss.model.cms
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2008/07/21/splitting-your-turbogears-sqlobject-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using raw SQL with SQLObject and keeping the object-y goodness</title>
		<link>http://www.psychicorigami.com/2007/12/16/using-raw-sql-with-sqlobject-and-keeping-the-object-y-goodness/</link>
		<comments>http://www.psychicorigami.com/2007/12/16/using-raw-sql-with-sqlobject-and-keeping-the-object-y-goodness/#comments</comments>
		<pubDate>Sun, 16 Dec 2007 15:13:37 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[sqlobject]]></category>
		<category><![CDATA[turbogears]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/2007/12/16/using-raw-sql-with-sqlobject-and-keeping-the-object-y-goodness/</guid>
		<description><![CDATA[This is sort of a continuation of my little SQLObject performance guide. So it might be worth reading that too, if you are after hints about speeding up SQLObject. Anyway, on with the show&#8230; It&#8217;s possible to create raw (database agnostic) sql queries with SQLObject. This can be really handy for those spots where you [...]]]></description>
			<content:encoded><![CDATA[<p>This is sort of a continuation of my <a href="http://psychicorigami.com/2007/10/27/a-little-sqlobject-performance-guide/">little SQLObject performance guide</a>.  So it might be worth reading that too, if you are after hints about speeding up <a href="http://sqlobject.org/">SQLObject</a>.  Anyway, on with the show&#8230;</p>
<p>It&#8217;s possible to create <a href="http://groovie.org/articles/2005/11/01/how-to-use-database-agnostic-sql-in-sqlobject">raw (database agnostic) sql queries with SQLObject</a>.  This can be really handy for those spots where you really need to speed things up.  It&#8217;s a bit like switching from Python to C for some performance intensive part of an application.</p>
<p>However when using raw SQL, we lose some of the nice-ness of SQLObject.  Results arrive as tuples and we may then have to do more work to make use of them.  So I&#8217;m going to discuss an example of using raw SQL in SQLObject, but still keeping the objects around.</p>
<h2>The Model Code</h2>
<p>In my example there are two model objects:</p>
<pre><code>
class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    views=SQLMultipleJoin('EntryView')

class EntryView(SQLObject):
    entry=ForeignKey('Entry')
</code></pre>
<p><code>Entry</code> being a blog entry and <code>EntryView</code> being an object to keep track of the <code>Entry</code> being viewed.  I&#8217;ve kept both objects free of details for this example, but obviously they could have all sorts of extra fields.</p>
<h2>N+1 Queries</h2>
<p>Now I want to get a list of all of the entries and how many views each entry has (sorted by number of views).  So using regular SQLObject this looks like:</p>
<pre><code>
    # class method on the Entry class
    @classmethod
    def get_entry_views(cls):
        entries=cls.select()

        # get the count for each entry
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry, entry.views.count()))

        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts
</code></pre>
<p>Which is pretty straight forward really and gives the follow results (for some sample data):</p>
<pre><code>
[(&lt;Entry 3 title='entry 3' body='body text 3'&gt;, 5),
 (&lt;Entry 1 title='hfdskhfks' body='fsdfsd'&gt;, 2),
 (&lt;Entry 2 title='hel' body='jjj'&gt;, 0)]
</code></pre>
<p>(tuple of <code>Entry</code> objects followed by view count).</p>
<p>However this causes the following SQL to be executed:</p>
<pre><code>
SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (1))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (2))
SELECT COUNT(*) FROM entry_view WHERE ((entry_view.entry_id) = (3))
</code></pre>
<p>Which seems a bit bad.  In fact this is a classic example of the <a href="http://www.pbell.com/index.cfm/2006/9/17/Understanding-the-n1-query-problem">N+1 problem</a>, where we run one initial query and then one query for each row in that result.</p>
<h2>2 queries</h2>
<p>So now let&#8217;s try making that a bit better, with this alternative method:</p>
<pre><code>
    # need to import everything from sqlobject.sqlbuilder
    @classmethod
    def get_entry_views2(cls):
        conn=cls._connection
        fields = [Entry.q.id,SQLConstant('COUNT(*)')]
        select = Select(
                        fields,
                        join=INNERJOINOn(Entry,EntryView,Entry.q.id==EntryView.q.entryID),
                        groupBy=Entry.q.id)
        sql=conn.sqlrepr(select)

        # get the counts via the raw
        # sql query
        counts={}
        for entry_id,count in conn.queryAll(sql):
            counts[entry_id]=count

        # now read in all of the entries
        # and match them with the counts
        entries=cls.select()
        entry_counts=[]
        for entry in entries:
            entry_counts.append((entry,counts.get(entry.id,0)))

        # now sort the list into descending order
        entry_counts.sort(key=lambda item:item[1])
        entry_counts.reverse()
        return entry_counts
</code></pre>
<p>This time I&#8217;m using a raw sql query to get all of the (non-zero) view counts in one query and then using another query to get all of the <code>Entry</code> objects.  Then using a bit of Python I stitch the results back together and sort it.</p>
<p>This generates the following SQL:</p>
<pre><code>
SELECT entry.id, COUNT(*) FROM  entry INNER JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id
SELECT entry.id, entry.title, entry.body FROM entry WHERE 1 = 1
</code></pre>
<p>That&#8217;s not as bad as before, but if we were using regular SQL we&#8217;d be doing this in a single query that also sorted the results by the count at the same time!</p>
<h2>1 query</h2>
<p>At the moment we basically need the 2nd query to get the actual objects.  If we could use one raw sql query to do the work for us and somehow use the results of the query to populate the relevant objects for us we&#8217;d be golden.  After a bit of digging around in the SQLObject source code I looked at the <code>get</code> class method definition:</p>
<pre><code>
# in main.py
class SQLObject(object):
    ...
    def get(cls, id, connection=None, selectResults=None):
</code></pre>
<p>Further examination showed that if I passed in <code>selectResults</code> (a list of field values) in the right order I could get an object instance either based on the results I passed in, or else the version of the object with the matching id in the cache.  Excellent.  So now we can have a method that works thus:</p>
<pre><code>
    @classmethod
    def get_entry_views3(cls):
        return select_with_count(cls,EntryView,Entry.q.id==EntryView.q.entryID,orderByDesc=True)
</code></pre>
<p>Where the juicy bit is here (to make it more reusable elsewhere):</p>
<pre><code>
def select_with_count(selectClass,joinClass,join_on,orderByDesc=False):
    conn=selectClass._connection
    fields = [selectClass.q.id]
    for col in selectClass.sqlmeta.columnList:
        fields.append(getattr(selectClass.q, col.name))

    # name we'll assign to the count
    # so we can sort on it
    count_field=("%s_count"%joinClass.__name__).lower()
    fields.append(SQLConstant('COUNT(%s) %s'%(joinClass.q.id, count_field)))

    orderBy=SQLConstant(count_field)
    if orderByDesc:
        orderBy=DESC(orderBy)

    select=Select(
            fields,
            join=LEFTJOINOn(selectClass,joinClass,join_on),
            groupBy=selectClass.q.id,
            orderBy=orderBy)
    sql=conn.sqlrepr(select)
    return read_from_results(conn.queryAll(sql),selectClass)

def read_from_results(results,selectClass):
    num_columns=len(selectClass.sqlmeta.columnList)
    items=[]
    for result in results:
        id,selectResults,extra=result[0],result[1:num_columns],result[num_columns:]
        entry=selectClass.get(id,selectResults=selectResults)
        items.append((entry,)+extra)
    return items
</code></pre>
<p>Which returns results in the same format as the original method and only generate one SQL query:</p>
<pre><code>
SELECT entry.id, entry.title, entry.body, COUNT(entry_view.id) entryview_count FROM  entry LEFT JOIN entry_view ON ((entry.id) = (entry_view.entry_id)) GROUP BY entry.id ORDER BY entryview_count DESC
</code></pre>
<p>There are a few of fiddly bits going on here that I&#8217;ll explain.</p>
<p>Firstly I perform a <code>LEFT JOIN</code> and use <code>COUNT(entry_view.id)</code> so we can results for entries that have no views.</p>
<p>Next, the order of the object fields has to match what SQLObject is expecting.  That order being defined by the class&#8217;s <code>sqlmeta.columnList</code>.</p>
<p>Finally to be able to sort by the view count I have to provide a name for the count (<code> entryview_count</code>), which I create based on the <code>EntryView</code> class name.</p>
<h2>In conclusion</h2>
<p>The example I gave was quite specific, but does show it&#8217;s possible to slightly better integrate raw SQL queries with SQLObject.  This means that it&#8217;s possible to retain more of the easy to use nature of SQLObject when needing to speed up a few critical queries.</p>
<p>I suspect that with a bit of work it would be possible to create a quite nice library for performing generalised queries with SQLObject and getting nice objects back.  For example it may be possible to use such techniques to eagerly load objects in joins (much as you can do in <a href="http://www.sqlalchemy.org/">SQLAlchemy</a> or the <a href="http://java.sun.com/developer/technicalArticles/J2EE/jpa/">Java Persitence API</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/12/16/using-raw-sql-with-sqlobject-and-keeping-the-object-y-goodness/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A little SQLObject performance guide</title>
		<link>http://www.psychicorigami.com/2007/10/27/a-little-sqlobject-performance-guide/</link>
		<comments>http://www.psychicorigami.com/2007/10/27/a-little-sqlobject-performance-guide/#comments</comments>
		<pubDate>Sat, 27 Oct 2007 15:39:49 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[sqlobject]]></category>
		<category><![CDATA[turbogears]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/2007/10/27/a-little-sqlobject-performance-guide/</guid>
		<description><![CDATA[For those that aren&#8217;t aware, SQLObject is an Object-Relational Mapping (ORM) library for Python. I use it in chrss (my chess by rss web app) as part of Turbogears. Ian and Kyran also use it as part of the ShowMeDo site. Chrss and ShowMeDo have quite different levels of traffic. ShowMeDo has a lot more [...]]]></description>
			<content:encoded><![CDATA[<p>For those that aren&#8217;t aware, <a href="http://www.sqlobject.org/">SQLObject</a> is an <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">Object-Relational Mapping</a> (ORM) library for Python.  I use it in <a href="http://chrss.co.uk/">chrss</a> (my chess by rss web app) as part of <a href="http://turbogears.org/">Turbogears</a>.  <a href="http://ianaozsvald.com/">Ian</a> and Kyran also use it as part of the <a href="http://showmedo.com/">ShowMeDo</a> site.</p>
<p>Chrss and ShowMeDo have quite different levels of traffic.  ShowMeDo has a lot more traffic than chrss, so performance might seem like more of an issue for ShowMeDo.  However as chrss is a game that requires more interaction from the user this is not necessarily the case.  If moving a piece takes even a second the site would seem sluggish.  Whereas for a content rich site such as ShowMeDo user expectation can be a bit more forgiving.</p>
<p>Until recently Ian and Kyran have not needed to worry about  performance and (rightly so) got on with the things that mattered (e.g. creating more screen-casts and building their community).</p>
<p>However the other day Ian asked me to help him out speed the site up.  They were having some issues with a page taking too long to render.  When creating chrss I&#8217;d spent a bit of extra time worrying about the performance of SQLObject, so I already knew what to look out for in their code.  Luckily it mostly only required a few small tweaks and things ran a good deal quicker.</p>
<p>So what can you do to speed up SQLObject?</p>
<h3>Enable Query Logging</h3>
<p>Obviously don&#8217;t do this for your production server (it&#8217;ll only slow things down), but by adding <code>?debug=1</code> to your database connection URI, you can enable <a href='http://www.sqlobject.org/SQLObject.html#declaring-the-class'>debug query logging</a>.  This will simply make SQLObject print out the details of every SQL statement that is ran against the database.</p>
<p>When developing this can give you a good idea of when you aren&#8217;t using SQLObject in an appropriate fashion.  If you see pages of SQL statements flying past in your console window you should probably have a look to see why!</p>
<p>Enabling query logging is only going to help if you actually understand the SQL that you are looking at.  <strong>Make sure you do some research if you aren&#8217;t familiar with SQL.</strong>  SQLObject makes dealing with a relational database easier, but you still need to understand what it is actually doing to make the most of it.</p>
<h3>SQLRelatedJoin/SQLMultipleJoin vs. RelatedJoin/MultipleJoin</h3>
<p>Your mileage may vary, but <strong>generally speaking I&#8217;d recommend not using <code>RelatedJoin</code></strong> (or <code>MultipleJoin</code>) to define many-to-many (or one-to-many) relationships with SQLObject.  Instead use the SQL* related versions (<code>SQLRelatedJoin</code> and <code>SQLMultipleJoin</code>).</p>
<p>Why though?</p>
<p>Well <code>RelatedJoin</code> (and <code>MultipleJoing</code>) loads data lazily.  Meaning that it first loads the id&#8217;s for each object, then uses a new query to load each object on demand.  <code>SQLRelatedJoin</code> on the other hand works like <code>select()</code> and loads up all the data in one query.  I&#8217;m simplifying a bit, but you can probably see that they behave differently.</p>
<p>Now sometimes lazy loading is what you want.  Each object may contain a lot of data and you know you don&#8217;t need all of it.</p>
<p>However for the &#8220;normal&#8221; case you probably just want to get your object loaded into memory, with as few queries as possible.  <code>SQLRelatedJoin</code> is what you want.</p>
<h3>An example</h3>
<p>I quick-started a project with <code>tg-admin</code> and created two model classes using <code>RelatedJoin</code> to link them:</p>
<pre><code>
class Entry(SQLObject):
    title=StringCol(length=255)
    body=StringCol()
    tags=RelatedJoin('Tag')

class Tag(SQLObject):
    name=StringCol(length=255,
                   alternateID=True,
                   alternateMethodName="by_tag_name")
    entries=RelatedJoin('Entry')
</code></pre>
<p>Pretty simple stuff.  We can define an <code>Entry</code> and add <code>Tag</code> objects to it.</p>
<p>Then I ran <code>tg-admin sql create</code> to populate the (<a href='http://www.sqlite.org/'>SQLite</a>) database.</p>
<p>Next I ran <code>tg-admin shell</code> so I could create some objects in the database:</p>
<pre><code>
entry=Entry(title='a title',body='entry body')
test_tag=Tag(name='test_tag')
tag2=Tag(name='tag2')
entry.addTag(test_tag)
entry.addTag(tag2)
</code></pre>
<p>I then added <code>?debug=1</code> to the database URI:</p>
<pre><code>sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite?debug=1"
</code></pre>
<p>Then I  restarted <code>tg-admin shell</code> (with the <a href="http://ipython.scipy.org/">IPython</a> shell) and ran the following:</p>
<pre><code>
In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:
 1/QueryAll:  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryR  :  SELECT tag_id FROM entry_tag WHERE entry_id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (1)
 1/QueryR  :  SELECT name FROM tag WHERE id = (1)
 1/QueryOne:  SELECT name FROM tag WHERE id = (2)
 1/QueryR  :  SELECT name FROM tag WHERE id = (2)
tag.name=test_tag
tag.name=tag2
</code></pre>
<p>As you can see with a <code>RelatedJoin</code> printing the two tags on the <code>Entry</code> requires the following three queries:</p>
<pre><code>
SELECT tag_id FROM entry_tag WHERE entry_id = (1)
SELECT name FROM tag WHERE id = (1)
SELECT name FROM tag WHERE id = (2)
</code></pre>
<p><small>(note how only the name field is queried for as this is all we use)</small><br />
The <code>RelatedJoin</code> performs lazy-loading and ends up having to perform one query per tag!  For two tags this might not be a problem, but it soon adds up if you aren&#8217;t careful.</p>
<h3>A minor change</h3>
<p>Simply changing <code>RelatedJoin</code> to <code>SQLRelatedJoin</code> in the models and running that same code yields:</p>
<pre><code>
In [1]: entry=Entry.get(1)
 1/QueryOne:  SELECT title, body FROM entry WHERE id = (1)
 1/QueryR  :  SELECT title, body FROM entry WHERE id = (1)

In [2]: for tag in entry.tags:
   ...:     print "tag.name=%s" % tag.name
   ...:
 1/Select  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
 1/QueryR  :  SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))
tag.name=test_tag
tag.name=tag2
</pre>
<p></code></p>
<p>Printing out the tag names for the entry now only requires one query:</p>
<pre><code>SELECT tag.id, tag.name FROM entry, tag, entry_tag WHERE ((tag.id = entry_tag.tag_id) AND ((entry_tag.entry_id = entry.id) AND (entry.id = 1)))</code></pre>
<p>This is a big improvement - the number of queries we will run now no longer depends on the number of objects being returned.</p>
<h3>Some caveats and notes</h3>
<p>It's not always this simple, so here are some issues you may encounter:</p>
<ul>
<li><code>RelatedJoin</code> returns a list, whereas <code>SQLRelatedJoin</code> returns a <code>SelectResults</code> object (the same kind of object returned when calling <code>select()</code>)</li>
<li>Large columns (text/binary blobs) won't get lazily loaded with <code>SQLRelatedJoin</code></li>
<li>Fewer database queries doesn't always mean your code will run faster - understand what each query is doing</li>
<li>Make sure you properly index your database</li>
<li>You need to <strong>understand the SQL that SQLObject generates</strong> to get the most out of SQLObject</li>
<li>SQLObject may not be as slow as you think - you might not be using it right</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/10/27/a-little-sqlobject-performance-guide/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

