Packing Python scripts for the 5K app

December 29th, 2008

So far for the 5K App I’ve written two example apps – one in Java and one in Javascript. This is partly so that I can figure out how the rules for the competition will need to work. In a lot of 4K competitions a single language is chosen, so things are a bit more straightforward. However Brighton seems like quite a varied place and allowing multiple languages for the competition seems like a nice friendly inclusive thing to do.

So the next example app I’ll be writing will be in Python. As Python source is the executable code there’s no compilation step (unlike Java) and un-like Javascript there aren’t really many tools for reducing code size. Python (like Ruby, Perl and PHP) tends to result in fairly concise code, so normally this isn’t too much of a problem. However 5K is quite a tough constraint and we really want to squeeze the most out of our code by compressing and packing it if possible.

So here’s a little script for compressing a Python script to make it much smaller:


import sys
import bz2
import base64

write=sys.stdout.write

for name in sys.argv[1:]:
    contents=bz2.compress(open(name).read())
    write('import bz2, base64\n')
    write('exec bz2.decompress(base64.b64decode(\'')
    write(base64.b64encode((contents)))
    write('\'))\n')

To use it save that code into a file (e.g. pack.py). It will write the compressed code to standard output (i.e. the screen), so you can redirect it to whatever file you want:


python pack.py my_script.py > my_script_packed.py

The compressed code looks like this:


import bz2, base64
exec bz2.decompress(base64.b64decode('<base64 encoded compressed data>'))

Which hopefully is fairly clear as to what it’s doing, but to summarize:

  • The script data is base 64 decoded into bytes
  • The bytes are then decompressed (bz2) into the text of the original script
  • exec is then called to run the original script

One nice benefit to this way of compressing the script is that the final module namespace (after de-compression) will look essentially the same as it did if it was not compressed. The only difference is the bz2 and base64 modules will also be present. This should mean that you can actually compress multiple Python files and importing from them should still work. Though of course as is the case when adding an extra layer of complexity your mileage may vary…

For an idea of how effective this compression can be I took the Python script for calculating pi on wikipedia and ran it through the script. After confirming that it ran the same, a quick comparison revealed the compressed version had gone from 12658 bytes to 3421 bytes – less than a third of the original size.

It should be possible to create similar scripts for Ruby (using zlib and base64), Perl (using Compress::Zlib and MIME::Base64) and PHP (using bzdecompress and base64_decode).

As I work on the example Python 5K app I should hopefully get a good feel for how the competition rules might need changing to allow “scripting” languages like Python, Ruby, Perl and PHP. I think the plan will be that the app must be contained in a single file (less than 5120 bytes in size), with any resources embedded in the file. This is the norm for Java and Flash, but will probably require an extra packaging step for most other languages/runtimes. That single file can then be run either via a GUI (double-clicking) or via a standard invocation from the command line (e.g. python my_script_packed.py) using only a “standard” version of the language runtime. Note that the standard installed version of Python on MacOS X 10.5 (Leopard) includes quite a few extra libraries (e.g. wxPython) so these libraries would be eligible for use in the 5K app. The same will be true for Ruby and Perl, so that should hopefully help open things out. Otherwise Java’s large standard library might give it too much of an advantage…

A 5K javascript, local storage backed, TODO list app

December 24th, 2008

More 5K App preparations. After using Java to write a 5K Twitter Client it’s time to change languages and dive into some Javascript. So the next 5K app is a TODO list app (a bit like 37Signals Ta-da List). The app runs locally in a browser that support the local storage APIs (which I think is only Safari at the moment, but Firefox and others should support it soon).

Check out the video below, or try out 5K TODO for yourself.



The TODO list app has the following features:

  • Add, edit, delete and re-order TODO items
  • All TODO items stored in client-side database

So, pretty simple really, but all less than 5K in size. In fact that’s actually the un-compressed side. If that was served up with a gzip filter it would shrink down even further!

Here are the vital statistics:

  • 13414 bytes of Javascript, 401 bytes of CSS and 425 bytes of HTML before packing
  • 5101 bytes of combined Javascript, CSS and HTML (single file) after packing and running through YUICompressor
  • 361 lines of Javascript

So, most of the credit for the final file size should probably be given to YUICompressor. Without much effort it easily knocked off half the size of the Javascript – without much work. However there were a few tricks used to get even more out of the compression.

Apart from removing unnecessary whitespace YUICompressor also renames all local variables and local functions to be as short as possible (typically a single letter). This only happens with local symbols, so the following code:


function myFunc() {
    alert("here");
}
window.onload = function() {
    myFunc();
}

Run through YUICompressor yields:


function myFunc(){alert("here")}window.onload=function(){myFunc()};

Whereas if myFunc() is instead declared inside of the onload handler:


window.onload = function() {
    function myFunc() {
        alert("here");
    }
    myFunc();
}

...it becomes a local function and YUICompressor can do it's work:


window.onload=function(){function a(){alert("here")}a()};

See how myFunc() has been renamed a() when the function is local.

So most of the strategies for reducing the javascript file size revolve around making more variables and functions local and hiding often used global functions (e.g. document.getElementById()) behind local functions.

Apart from that the only other thing I did was to write a script to take the separate html, css and javascript files and generate one html file containing everything. This didn't really save any space (maybe a few bytes), but it meant that the app was only in one file and hence easier to distribute. Plus it also means that on YSlow 5KTODO gets a perfect score - which is always a bonus.

A 5K Twitter Client

December 22nd, 2008

So as part of preparations for the 5K App I’m writing a few example apps. The first one so far is a 5K Twitter Client, which makes it smaller than the Twitter logo itself (at 5.4K).

Check out the demo video to see it in action or download 5KTwit (requires Java):



The features are:

  • Update twitter status
  • View friends timeline (new tweets checked for every 3 minutes)
  • Easily change logged-in user
  • Count of characters remaining when typing status message

It doesn’t handle viewing direct messages, but I don’t tend to use them. Plus I wanted to make sure the features I had were done fairly well.

Anyway, I thought it would be a good idea to walk through making a 5K app, as there’s a fair bit of stuff just needs to be done to make it feasible. It’s the sort of stuff you don’t normally encounter daily when working on “real” software – unless you’ve been working on mobile or embedded apps. Nowadays most programmers don’t really need to worry about the size of the code they write, so how to make something that’s less that 5K in size isn’t immediately obvious.

The Twitter app (which I’ve lovingly christened “5KTwit”) is written in Java and runs as a standard client-side application. For some good pointers on how to make a Java app that’s really tiny, but packs a punch you should probably start with these Java 4K tips.

Now for a few numbers:

  • Final (obfuscated) app size: 5119 bytes
  • Final (un-obfuscated) app size: 6400 bytes
  • Final Source-code size: 10398 bytes
  • Lines of code: 310

As you can see the difference between the obfuscated and un-obfuscated versions is about 1K, so obviously obfuscation is pretty key. So it’s worth starting of with a good obfuscator. For those going “what’s an obfuscator?” it’s basically a tool to optimise, shrink and scramble your code. Originally they were intended to make code harder to decompile (hence “obfuscate”), but also have the added benefit of shrinking the compiled code to. I used Proguard and that’s probably your best bet too.

Proguard does a pretty good job of shrinking your code, but it’s worth reading up on how it works, so you can get an idea of what code changes you can make to get the most effect. For example Proguard will inline (and remove) small private functions for you (so they effectively have no overhead), but only if they are only called in one place. In general you probably want to use a source-control system/backup changes so you can rollback when a change doesn’t result in smaller obfuscated code.

Here’s the config file I used for Proguard (running on OS X):


-injars      dist/5KTwit.jar
-outjars     obfuscated/5KTwit.jar
-libraryjars <java.home>/../Classes/classes.jar
-optimizationpasses 2
-allowaccessmodification
-keep public class T  {
    public static void main(java.lang.String[]);
}

Proguard does a great job, but it won’t stop you writing code that’s bigger than it needs to be. In order to get the app under 5K I only used two classes (one for the main window and one for the username/password dialog) and one resource file. Each additional file has some overhead associated with it. Not only will each file tend to have some standard header info etc, but even if it was empty it would increase the size of the packaged jar file. So really cutting down on the number of files used makes a big difference. During testing I made a simple HelloWorld application and that was 700 bytes or so in size with a single class file and no real code. Most of that size would have been taken up with this sort of overhead.

As this app used swing for the GUI it also meant I had to be a bit creative about event handling. The norm is to have small event handling classes, but that would have incurred too much overhead. Instead I resorted to tricks like this:


    public void actionPerformed(ActionEvent e) {
        if ( e.getSource() == prefs ) {
            start();
        }
        else {
            String status = this.status.getText();
            if ( status.length() <= MAX_CHAR ) {
                tweetsDoc = loadDocument("http://twitter.com/statuses/update.xml", status);
                loadTweets();
                this.status.setText("");
            }
        }
    }

Where I checked the source of the event to see which button had been pressed, rather than having a separate event listener for each button.

I also had to deal with listening for window resize events, as there was some manual intervention required to make tweets wrap inside the scroll pane. So for that I circumvented the listener approach entirely and overrode processComponentEvent. In this code I had to try to estimate how big the panel containing the tweets would be, when I knew the width I wanted. So I resorted to a bit of nasty hackery and repeatedly set/unset the preferred size a few times so it would settle on the right height…

    protected void processComponentEvent(ComponentEvent e) {
        if ( e != null ) { super.processComponentEvent(e); }
        
        // force width of tweets to match width of window
        for ( int i = 0; i < 3; i++ ) {
            tweets.invalidate();
            tweets.setPreferredSize(null);
            int height = tweets.getPreferredSize().height;
            tweets.setPreferredSize(new Dimension(scrollPane.getViewportBorderBounds().width,height));
            scrollPane.validate();
        }
    }

Whereas in "real" code this would probably be done with a bit more finesse (and a few more classes).

Another tricky spot involved the background thread used to check for new tweets. This ran on another thread (rather than the main GUI thread), so as not to lock up the GUI when it runs (once every three minutes). However updating the GUI had to occur on the GUI thread. Normally this would simply be done by creating a few Runnable tasks, but that would have meant more classes. Instead I end up checking to see whether we're running on the GUI thread and change behaviour if that's the case:

    public void run() {
        if ( SwingUtilities.isEventDispatchThread() ) {
            // update UI as running on the right thread
            loadTweets();
        }
        else {
            while( currentThread == Thread.currentThread() ) {
                String friendsTimelineURL = "http://twitter.com/statuses/friends_timeline.xml?count=200";
                if ( lastTweetID != null ) {
                    friendsTimelineURL += "&since_id=" + lastTweetID;
                }
                tweetsDoc = loadDocument(friendsTimelineURL, null);
                try {
                    // invoke and wait, to run loadTweets()
                    // on awt thread
                    SwingUtilities.invokeAndWait(this);
                    Thread.sleep(1000*60*3); // 3 mins
                }
                catch( Exception e ) {
                    // break loop
                    break;
                }
            }
        }
    }

There are probably quite a few other places where I had to make this sort of "design" choice. It's feasible to do for this little code, but I wouldn't want to do this kind of thing in a big project. Though I guess if it was really needed you could write something that would take nice well written code and convert it to use these sorts of tricks...

Apart from that, most of the rest of the work came down to carefully weighing up which features I wanted and carefully tweaking and re-organising the code to make it smaller. I also found that the single image (for the preference button) was a lot smaller when saved as a gif rather than a png (90 bytes vs ~200 bytes).

Anyway, for 5K it's not a bad little client. I've actually been using it for the past couple of days (as I've been writing it). I doubt it would really survive under the hands of a more active user, but it does the job.

Drummer Girl

December 7th, 2008