<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Psychic Origami &#187; Optimisation</title>
	<atom:link href="http://www.psychicorigami.com/category/optimisation/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.psychicorigami.com</link>
	<description>folding with my brain</description>
	<lastBuildDate>Wed, 03 Aug 2011 19:13:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Tackling the travelling salesman problem: simulated annealing</title>
		<link>http://www.psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/</link>
		<comments>http://www.psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/#comments</comments>
		<pubDate>Thu, 28 Jun 2007 13:16:18 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TSP]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/</guid>
		<description><![CDATA[This is the third part in my series on the &#8220;travelling salesman problem&#8221; (TSP). Part one covered defining the TSP and utility code that will be used for the various optimisation algorithms I shall discuss. Part two covered &#8220;hill-climbing&#8221; (the simplest stochastic optimisation method). getting stuck, because you&#8217;re greedy As I discussed in the article [...]]]></description>
			<content:encoded><![CDATA[<p>This is the third part in <a href='http://psychicorigami.com/category/tsp/'>my series</a> on the &#8220;travelling salesman problem&#8221; (TSP). <a href='http://psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/'>Part one</a> covered defining the TSP and utility code that will be used for the various optimisation algorithms I shall discuss.  <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>Part two</a> covered &#8220;hill-climbing&#8221; (the simplest stochastic optimisation method).</p>
<h3>getting stuck, because you&#8217;re greedy</h3>
<p>As I discussed in the <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>article</a> about hill-climbing it is possible for an algorithm to find a solution that is &#8220;locally optimal&#8221;, but not necessarily &#8220;globally optimal&#8221;.  That is to say we may find ourselves with a solution that is at the top of a local maximum &#8211; it&#8217;s the best thing nearby, but it might not be <em>the</em> best thing.  This happens with hill-climbing, because when we are offered the choice between two solutions we <em>always</em> take the best solution.  The algorithm is &#8220;greedy&#8221;.  It&#8217;s also short sighted.<br />
<center><img title='greediness gets us stuck' src='http://psychicorigami.com/wp-content/uploads/2007/05/bumpy.gif' /></center><br />
So instead we could try occasionally choosing something that&#8217;s worse.  By doing that the algorithm can go &#8220;downhill&#8221; sometimes and hopefully reach new areas of the solution landscape.</p>
<h3>simulated annealing</h3>
<p>Taking it&#8217;s name from a <a href='http://en.wikipedia.org/wiki/Annealing_(metallurgy)'>metallurgic process</a>, simulated annealing is essentially hill-climbing, but with the ability to go downhill (sometimes).</p>
<p>It introduces a &#8220;temperature&#8221; variable.  When the &#8220;temperature&#8221; is high a worse solution will have a higher chance of being chosen.  It work&#8217;s like this:</p>
<ol>
<li>pick an initial solution</li>
<li>set an initial temperature</li>
<li>choose the next neighbour of the current solution:
<ul>
<li>if the neighbour is &#8220;better&#8221; make that neighbour the current solution</li>
<li>if the neighbour is &#8220;worse&#8221;, probabilistically make this neighbour the current solution, based on the current temperature and how much &#8220;worse&#8221; the neighbour is</li>
</ul>
</li>
<li>decrease the temperature slightly</li>
<li>go to 3.</li>
</ol>
<p>By slowly cooling the temperature we become less likely to choose worse solutions over time.  Initially we are able to make some pretty big jumps around the solution landscape.  By the end of a run we&#8217;ll be jumping around less and less.  In fact if we lower the temperature enough we end up with plain old hill-climbing.</p>
<h3>probabilistically choosing a neighbour</h3>
<p>Below is the Python code to decide if what probability we will assign to moving from a solution with a score of <code>prev_score</code> to a solution with a value of <code>next_score</code> at the current <code>temperature</code>.</p>
<pre><code>
def P(prev_score,next_score,temperature):
    if next_score > prev_score:
        return 1.0
    else:
        return math.exp( -abs(next_score-prev_score)/temperature )
</code></pre>
<p>To keep later logic simpler I&#8217;m returning <code>1.0</code> if <code>next_score</code> is better &#8211; so we&#8217;ll always choose better solutions.</p>
<p>When the <code>prev_score</code> is worse we create a probability based on the difference between <code>prev_score</code> and <code>next_score</code> scaled by the current temperature.  If we chart the probabilities versus the difference in scores we get (with a <code>temperature</code> of 1.0):<br />
<center><img title='difference vs. probability' src='http://psychicorigami.com/wp-content/uploads/2007/06/saprobs.gif' /></center><br />
As can be seen, for small differences (relative to the current temperature) we will have a high probability.  This then tails off very quickly, so solutions that are much worse are increasingly less likely to be chosen.</p>
<p>The net-effect being that solutions that are only a little bit worse are still fairly likely to be chosen.  Much worse solutions may still be chosen, but it&#8217;s much less likely.</p>
<h3>the cooling schedule</h3>
<p>Temperature is a key part of simulated annealing.  How we lower the temperature over time is therefore very important.  There are a couple of possible approaches, but I&#8217;ll show the one outlined by <a href='http://citeseer.ist.psu.edu/kirkpatrick83optimization.html'>Kirkpatrick et al</a>:</p>
<pre><code>
def kirkpatrick_cooling(start_temp,alpha):
    T=start_temp
    while True:
        yield T
        T=alpha*T
</code></pre>
<p>This is a generator function that takes an initial start temperature (<code>start_temp</code>) and returns a series of temperatures that are <code>alpha</code> times the size, where <code>alpha</code> is less than one.  So we end up with a temperature that drops off quite quickly and then slowly decreases to practically nothing.</p>
<h3>remembering the best solution</h3>
<p>One other minor, but key, implementation detail is saving the best solution we find during the annealing process.</p>
<p>During hill-climbing the current solution was always the best solution found, but simulated annealing will deliberately accept worse solutions at times.  So we need to make sure we don&#8217;t just throw away the best we see.  To avoid complicating the algorithm itself with extra checks of scores etc.</p>
<p>I am going to use a class to wrap the objective function.  I&#8217;ll override the <code>__call__</code> method of the class, so that I can use the instance of the class like a function &#8211; in place of the normal objective function:</p>
<pre><code>
class ObjectiveFunction:
    '''class to wrap an objective function and
    keep track of the best solution evaluated'''
    def __init__(self,objective_function):
        self.objective_function=objective_function
        self.best=None
        self.best_score=None

    def __call__(self,solution):
        score=self.objective_function(solution)
        if self.best is None or score > self.best_score:
            self.best_score=score
            self.best=solution
        return score
</pre>
<p></code></p>
<p>We can then access then <code>best</code> and <code>best_score</code> fields when we have finished our annealing.</p>
<h3>simulated annealing itself</h3>
<p>The code below represents the simulated annealing algorithm.  In many respects it is pretty similar to <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>hill-climbing</a>, but we are also concerned with a current temperature and we have introduced a probabilistic element to choosing the next solution.</p>
<pre><code>
def anneal(init_function,move_operator,objective_function,max_evaluations,start_temp,alpha):

    # wrap the objective function (so we record the best)
    objective_function=ObjectiveFunction(objective_function)

    current=init_function()
    current_score=objective_function(current)
    num_evaluations=1

    cooling_schedule=kirkpatrick_cooling(start_temp,alpha)

    for temperature in cooling_schedule:
        done = False
        # examine moves around our current position
        for next in move_operator(current):
            if num_evaluations >= max_evaluations:
                done=True
                break

            next_score=objective_function(next)
            num_evaluations+=1

            # probablistically accept this solution
            # always accepting better solutions
            p=P(current_score,next_score,temperature)
            if random.random() < p:
                current=next
                current_score=next_score
                break
        # see if completely finished
        if done: break

    best_score=objective_function.best_score
    best=objective_function.best

    return (num_evaluations,best_score,best)
</code></pre>
<p>The parameters are much the same as hill-climbing, but there are two extra specific to simulated annealing:</p>
<ul>
<li><code>init_function</code> - the function used to create our initial solution</li>
<li><code>move_operator</code> - the function we use to iterate over all possible "moves" for a given solution</li>
<li><code>objective_function</code> - used to assign a numerical score to a solution - how "good" the solution is</li>
<li><code>max_evaluations</code> - used to limit how much search we will perform (how many times we'll call the <code>objective_function</code>)</li>
<li><code>start_temp</code> - the initial starting temperature for annealing</li>
<li><code>alpha</code> - should be less than one.  controls how quickly the temperature reduces</li>
</ul>
<p>I am also only reducing the temperature after either accepting a new solution or evaluating all neighbours without choosing any of them.  This is done so that temperature will only decrease as we start accepting moves.  As that will be less frequent than just evaluating moves we cooling will happen at a slower pace.  If we are accepting lots of moves then this will drop the temperature quite quickly.  If we are not accepting many moves the temperature will stay steadier - maintaining the likelihood of accepting other "worse" moves.  That latter point is useful, as if we are starting to get stuck on a local maximum the temperature won't decrease - hopefully helping us get unstuck.</p>
<h3>results</h3>
<p>It made sense to compare simulated annealing with hill-climbing, to see whether simulated annealing really helps us to stop getting stuck on local maximums.</p>
<p>I performed 100 runs of each algorithm on my randomly generated 100 city tour, once with 50000 and once with 100000 evaluations.  Both algorithms used the <code>reversed_sections</code> move operator.  For simulated annealing I chose an initial temperature and alpha that seemed to perform well.</p>
<p><center></p>
<table>
<tr>
<th>evaluations</th>
<th>algorithm</th>
<th>average</th>
<th>s.d.</th>
<th>worst</th>
<th>best</th>
</tr>
<tr>
<td>50000</td>
<td>hill-climbing</td>
<td>-4228.50</td>
<td>126.45</td>
<td>-4627.07</td>
<td>-3942.03</td>
</tr>
<tr>
<td>50000</td>
<td>annealing (<code>start_temp=10, alpha=0.9999</code>)</td>
<td>-4145.69</td>
<td>96.56</td>
<td>-4422.04</td>
<td>-3924.34</td>
</tr>
<tr>
<td>100000</td>
<td>hill-climbing</td>
<td>-4154.25</td>
<td>90.60</td>
<td>-4513.11</td>
<td>-3946.65</td>
</tr>
<tr>
<td>100000</td>
<td>annealing (<code>start_temp=10, alpha=0.99995</code>)</td>
<td>-4077.40</td>
<td>71.72</td>
<td>-4294.97</td>
<td>-3907.19</td>
</tr>
</table>
<p></center></p>
<p>These results seem to show that simulated annealing performed better than hill-climbing.  In fact it can be seen that with just 50000 evaluations, simulated annealing was able to do a slightly better job than hill-climbing with 100000 evaluations!  This makes sense, as when running hill-climbing with logging turned on I could see that after about 50000 evaluations hill-climbing was getting stuck and restarting.  With more evaluations available it was possible to push simulated annealing further still.</p>
<p>However in both cases I had to perform several test runs to find reasonable starting temperatures and alpha values to get these kind of results.  It was quite easy to set these parameters wrong and get much worse results than with hill-climbing.</p>
<h3>conclusion</h3>
<p>Simulated annealing is a pretty reasonable improvement over hill-climbing.  For a modest amount of extra code (in this cases 10's of lines) we are able to address hill-climbing's fundamental weakness (getting stuck) and yield much better results.</p>
<p>However by introducing two extra parameters we have shifted some of the burden in finding good solutions to ourselves.  We have to tune these parameters carefully.  Values that are good for one problem may not work so well for another.  We end up with more "switches to flick" in the hope of making something work.</p>
<p>Next time around I will be discussing evolutionary algorithms, which pursue other ways to avoid getting stuck on local maximums and are also able to combine several solutions to explore new areas of the solution landscape.</p>
<h3>source-code</h3>
<p>Full source-code is available <a href='http://www.littlespikeyland.com/tsp/tsp_part_three.tar.gz'>here</a> as a .tar.gz file.</p>
<p>(The source-code contains more code than shown here, to handle parsing parameters passed to the program etc. I’ve not discussed this extra code here simply to save space.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Tackling the travelling salesman problem: a little profiling</title>
		<link>http://www.psychicorigami.com/2007/05/20/tackling-the-travelling-salesman-problem-a-little-profiling/</link>
		<comments>http://www.psychicorigami.com/2007/05/20/tackling-the-travelling-salesman-problem-a-little-profiling/#comments</comments>
		<pubDate>Sun, 20 May 2007 16:50:23 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TSP]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/2007/05/20/tackling-the-travelling-salesman-problem-a-little-profiling/</guid>
		<description><![CDATA[So after implementing hill-climbing, I thought it would be a worthwhile exercise to use Python&#8217;s profile module and see what the slow parts of the code were. To do this I set-up a basic hill-climb on the 100 city data set and then called the run method of profile: import tsp import profile coords=tsp.read_coords(file('city100.txt')) init_function=lambda: [...]]]></description>
			<content:encoded><![CDATA[<p>So after <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>implementing hill-climbing</a>, I thought it would be a worthwhile exercise to use Python&#8217;s <a href='http://docs.python.org/lib/module-profile.html'>profile module</a> and see what the slow parts of the code were.  To do this I set-up a basic hill-climb on the 100 city data set and then called the run method of profile:<br />
<code>
<pre>
import tsp
import profile

coords=tsp.read_coords(file('city100.txt'))
init_function=lambda: tsp.init_random_tour(len(coords))
matrix=tsp.cartesian_matrix(coords)
objective_function=lambda tour: -tsp.tour_length(matrix,tour)
move_operator=tsp.reversed_sections
max_iterations=10000

profile.run(
 'tsp.run_hillclimb(init_function,move_operator,objective_function,max_iterations)')
</pre>
<p></code></p>
<p>This yielded the following (edited for clarity/brevity) output:<br />
<code>
<pre>
      122433 function calls in 4.710 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.010    0.010    0.060    0.060 hillclimb.py:1(?)
     1    0.190    0.190    4.640    4.640 hillclimb.py:3(hillclimb)
     1    0.000    0.000    4.640    4.640 hillclimb.py:36(hillclimb_and_restart)
   607    0.730    0.001    1.220    0.002 random.py:252(shuffle)
     1    0.000    0.000    0.000    0.000 tsp.py:127(init_random_tour)
     1    0.000    0.000    4.700    4.700 tsp.py:132(run_hillclimb)
 10106    0.110    0.000    1.350    0.000 tsp.py:29(all_pairs)
 10000    0.440    0.000    1.790    0.000 tsp.py:40(reversed_sections)
 10000    2.180    0.000    2.470    0.000 tsp.py:82(tour_length)
</pre>
<p></code></p>
<p>Now this is pretty close to what I would expect &#8211; the <code>tour_length</code> function (our objective function) is taking up most of the time (2.470 seconds out of 4.710).  For optimisation problems it is usually very common that the objective function is the most time consuming &#8211; which is why when comparing different optimisation algorithms a lot of attention is paid to how many times the objective function must be called.</p>
<p>However there was one unexpectedly expensive call in there: <code>reversed_sections</code> is using 1.790 seconds!  Examining the output further shows that most of the time in <code>reversed_sections</code> is spent in <code>all_pairs</code> and most of the time in that function is spent in <code>random.shuffle</code>.  So looking at <code>all_pairs</code>:<br />
<code>
<pre>
def all_pairs(size,shuffle=random.shuffle):
    '''generates all i,j pairs for i,j from 0-size'''
    r1=range(size)
    r2=range(size)
    if shuffle:
        shuffle(r1)
        shuffle(r2)
    for i in r1:
        for j in r2:
            yield (i,j)
</pre>
<p></code></p>
<p>We see that we call <code>random.shuffle</code> twice at the start of the generator function.  However quite often, when hill-climbing, we will not run this generator fully.  If we find a better solution we will accept it and move on &#8211; creating a new generator.  So clearly shuffling two arrays of 100 elements each is overkill, if we don&#8217;t always use every element.</p>
<p>Instead we can modify <code>all_pairs</code> to only generate the random elements as it needs them:<br />
<code>
<pre>
def rand_seq(size):
    '''generates values in random order
    equivalent to using shuffle in random,
    without generating all values at once'''
    values=range(size)
    for i in xrange(size):
        # pick a random index into remaining values
        j=i+int(random.random()*(size-i))
        # swap the values
        values[j],values[i]=values[i],values[j]
        # return the swapped value
        yield values[i] 

def all_pairs(size):
    '''generates all i,j pairs for i,j from 0-size'''
    for i in rand_seq(size):
        for j in rand_seq(size):
            yield (i,j)
</pre>
<p></code></p>
<p>Profiling this version gives us:<br />
<code>
<pre>
      82205 function calls in 3.970 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.010    0.010    0.050    0.050 hillclimb.py:1(?)
     1    0.230    0.230    3.910    3.910 hillclimb.py:3(hillclimb)
     1    0.000    0.000    3.910    3.910 hillclimb.py:36(hillclimb_and_restart)
     1    0.010    0.010    0.010    0.010 random.py:252(shuffle)
     1    0.000    0.000    0.010    0.010 tsp.py:127(init_random_tour)
     1    0.000    0.000    3.960    3.960 tsp.py:132(run_hillclimb)
 10100    0.150    0.000    0.420    0.000 tsp.py:22(all_pairs)
 10000    0.520    0.000    0.950    0.000 tsp.py:40(reversed_sections)
 10000    2.300    0.000    2.560    0.000 tsp.py:82(tour_length)
 10507    0.190    0.000    0.270    0.000 tsp.py:9(rand_seq)
</pre>
<p></code></p>
<p>As can be seen <code>all_pairs</code> (and <code>reversed_sections</code>) takes less of the total running time than before &#8211; leaving <code>tour_length</code> as the major bottleneck.</p>
<h3>conclusion</h3>
<p>The profile module is a pretty handy way for spotting slow points in code that you may not have realised existed.  In this case we&#8217;ve been able to speed up code that would not seem like it should be costly &#8211; giving us a modest speed boost (~15%) for not much extra effort.</p>
<p>However care is advised, as we didn&#8217;t address the slowest part of the code.  Changes to that would probably yield much larger speed improvements.  This is particularly true as our problem size increases.  With 500 cities <code>tour_length</code> represents an even higher proportion of the total running time (~90%).  As ever:</p>
<blockquote><p><a href='http://www.brainyquote.com/quotes/quotes/d/donaldknut181625.html'>&#8220;premature optimisation is the root of all evil&#8221;</a></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/05/20/tackling-the-travelling-salesman-problem-a-little-profiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tackling the travelling salesman problem: hill-climbing</title>
		<link>http://www.psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/</link>
		<comments>http://www.psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/#comments</comments>
		<pubDate>Sat, 12 May 2007 13:56:55 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TSP]]></category>

		<guid isPermaLink="false">http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/</guid>
		<description><![CDATA[This is the second part in my series on the &#8220;travelling salesman problem&#8221; (TSP). Part one covered defining the TSP and utility code that will be used for the various optimisation algorithms I shall discuss. solution landscapes A common way to visualise searching for solutions in an optimisation problem, such as the TSP, is to [...]]]></description>
			<content:encoded><![CDATA[<p>This is the second part in <a href='http://psychicorigami.com/category/tsp/'>my series</a> on the &#8220;travelling salesman problem&#8221; (TSP).  <a href='http://psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/'>Part one</a> covered defining the TSP and utility code that will be used for the various optimisation algorithms I shall discuss.</p>
<h3>solution landscapes</h3>
<p>A common way to visualise searching for solutions in an optimisation problem, such as the TSP, is to think of the solutions existing within a &#8220;landscape&#8221;.  Better solutions exist higher up and we can take a step from one solution to another in search of better solutions.  How we make steps will depend on the &#8220;move operators&#8221; we have available and will therefore also affect how the landscape &#8220;looks&#8221;.  It will change which solutions are &#8220;adjacent&#8221; to each other.  For a simple optimisation problem we can directly visual the solution landscape:<br />
<center><br />
<img src='http://psychicorigami.com/wp-content/uploads/2007/05/smooth.gif' alt='smooth.gif' /><br />
</center><br />
The red dot represents our current solution.  It should be pretty clear that if we simply carry on going &#8220;uphill&#8221; we&#8217;ll get to the highest point in this solution landscape.</p>
<p>If we are using evolutionary optimisation methods a solution landscape will often be referred to as a <a href='http://en.wikipedia.org/wiki/Fitness_landscape#Fitness_landscapes_in_evolutionary_optimization'>fitness landscape</a>.</p>
<h3>hill-climbing</h3>
<p><a href='http://en.wikipedia.org/wiki/Hill_climbing'>Hill-climbing</a>, pretty much the simplest of the <a href='http://en.wikipedia.org/wiki/Stochastic_optimization'>stochastic optimisation methods</a>, works like this:</p>
<ol>
<li>pick a place to start</li>
<li>take any step that goes &#8220;uphill&#8221;</li>
<li>if there are no more uphill steps, stop;<br />
   &nbsp;otherwise carry on taking uphill steps
 </li>
</ol>
<p>Metaphorically the algorithm climbs up a hill one step at a time.  It is a &#8220;greedy&#8221; algorithm and only ever takes steps that take it uphill (though it can be adapted to behave differently).  This means that it is pretty quick to get to the top of a hill, but depending on where it starts it may not get to the top of the biggest hill:<br />
<center><br />
<img src='http://psychicorigami.com/wp-content/uploads/2007/05/bumpy.gif' alt='bumpy.gif' /><br />
</center><br />
As you can see our current solution (the red dot) can only go downhill from it&#8217;s current position &#8211; yet it is not at the highest point in the solution landscape.</p>
<p>The &#8220;biggest&#8221; hill in the solution landscape is known as the <a href='http://mathworld.wolfram.com/GlobalMaximum.html'>global maximum</a>.  The top of any other hill is known as a <a href='http://mathworld.wolfram.com/LocalMaximum.html'>local maximum</a> (it&#8217;s the highest point in the local area).  Standard hill-climbing will tend to get stuck at the top of a local maximum, so we can modify our algorithm to restart the hill-climb if need be.  This will help hill-climbing find better hills to climb &#8211; though it&#8217;s still a random search of the initial starting points.</p>
<h3>objective and initialisation functions</h3>
<p>To get started with the hill-climbing code we need two functions:</p>
<ul>
<li>an initialisation function &#8211; that will return a random solution</li>
<li>an objective function &#8211; that will tell us how &#8220;good&#8221; a solution is</li>
</ul>
<p>For the TSP the initialisation function will just return a tour of the correct length that has the cities arranged in a random order.</p>
<p>The objective function will return the <strong>negated</strong> length of a tour/solution.  We do this because we want to <em>maximise</em> the objective function, whilst at the same time <em>minimise</em> the tour length.</p>
<p>As the hill-climbing code won&#8217;t know specifically about the TSP we need to ensure that the initialisation function takes no arguments and returns a tour of the correct length and the objective function takes one argument (the solution) and returns the negated length.</p>
<p>So assuming we have our city co-ordinates in a variable <code>coords</code> and our distance matrix in <code>matrix</code> we can define the objective function and initialisation functions as follows:</p>
<pre><code>
def init_random_tour(tour_length):
   tour=range(tour_length)
   random.shuffle(tour)
   return tour

init_function=lambda: init_random_tour(len(coords))
objective_function=lambda tour: -tour_length(matrix,tour) #note negation
</code></pre>
<p>Relying on <a href='http://www-128.ibm.com/developerworks/library/l-prog2.html#h1'>closures</a> to let us associate <code>len(coords)</code> with the <code>init_random_tour</code> function and <code>matrix</code> with the <code>tour_length</code> function.  The end result is two function <code> init_function</code> and <code>objective_function</code> that are suitable for use in the hill-climbing function.</p>
<h3>the basic hill-climb</h3>
<p>The basic hill-climb looks like this in Python:</p>
<pre><code>
def hillclimb(
    init_function,
    move_operator,
    objective_function,
    max_evaluations):
    '''
    hillclimb until either max_evaluations
    is reached or we are at a local optima
    '''
    best=init_function()
    best_score=objective_function(best)

    num_evaluations=1

    while num_evaluations < max_evaluations:
        # examine moves around our current position
        move_made=False
        for next in move_operator(best):
            if num_evaluations >= max_evaluations:
                break

            # see if this move is better than the current
            next_score=objective_function(next)
            num_evaluations+=1
            if next_score > best_score:
                best=next
                best_score=next_score
                move_made=True
                break # depth first search

        if not move_made:
            break # we couldn't find a better move
                     # (must be at a local maximum)

    return (num_evaluations,best_score,best)
</code></pre>
<p><small>(I&#8217;ve removed some logging statements for clarity)</small></p>
<p>The parameters are as follow:</p>
<ul>
<li><code>init_function</code> &#8211; the function used to create our initial solution</li>
<li><code>move_operator</code> &#8211; the function we use to iterate over all possible &#8220;moves&#8221; for a given solution (for the TSP this will be <code>reversed_sections</code> or <code>swapped_cities</code>)</li>
<li><code>objective_function</code> &#8211; used to assign a numerical score to a solution &#8211; how &#8220;good&#8221; the solution is (as defined above for the TSP)</li>
<li><code>max_evaluations</code> &#8211; used to limit how much search we will perform (how many times we&#8217;ll call the <code>objective_function</code>)</li>
</ul>
<h3>hill-climb with random restart</h3>
<p>With a random restart we get something like:</p>
<pre><code>
def hillclimb_and_restart(
    init_function,
    move_operator,
    objective_function,
    max_evaluations):
    '''
    repeatedly hillclimb until max_evaluations is reached
    '''
    best=None
    best_score=0

    num_evaluations=0
    while num_evaluations < max_evaluations:
        remaining_evaluations=max_evaluations-num_evaluations

        evaluated,score,found=hillclimb(
            init_function,
            move_operator,
            objective_function,
            remaining_evaluations)

        num_evaluations+=evaluated
        if score > best_score or best is None:
            best_score=score
            best=found

    return (num_evaluations,best_score,best)
</code></pre>
<p>The parameters match those of the <code>hillclimb</code> function.</p>
<p>This function simply calls <code>hillclimb</code> repeatedly until we have hit the limit specified by <code>max_evaluations</code>, whereas <code>hillclimb</code> on it&#8217;s own will not necessarily use all of the evaluations assigned to it.</p>
<h3>results</h3>
<p>Running the two different move operators (<code>reversed_sections</code> and <code>swapped_cities</code> &#8211; see <a href='http://psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/'>part one</a> for their definitions) on a 100 city tour produced some interesting differences.</p>
<p>Ten runs of 50000 evaluations (calls to the objective function) yielded:<br />
<center></p>
<table>
<tr>
<th>reversed_sections</th>
<th>swapped_cities</th>
</tr>
<tr>
<td>-4039.86</td>
<td>-6451.12</td>
</tr>
<tr>
<td>-4075.04</td>
<td>-6710.48</td>
</tr>
<tr>
<td>-4170.41</td>
<td>-6818.04</td>
</tr>
<tr>
<td>-4178.59</td>
<td>-6830.57</td>
</tr>
<tr>
<td>-4199.94</td>
<td>-6831.48</td>
</tr>
<tr>
<td>-4209.69</td>
<td>-7216.22</td>
</tr>
<tr>
<td>-4217.59</td>
<td>-7357.70</td>
</tr>
<tr>
<td>-4222.46</td>
<td>-7603.30</td>
</tr>
<tr>
<td>-4243.78</td>
<td>-7657.69</td>
</tr>
<tr>
<td>-4294.93</td>
<td>-7750.79</td>
</tr>
</table>
<p><small>(these are the scores from the objective function and represent negative tour length)</small><br />
</center></p>
<p>In this case <code>reversed_sections</code> clearly performed much better.  The best solution for <code>reversed_sections</code> looked like:<br />
<center><a href='http://psychicorigami.com/wp-content/uploads/2007/05/city100-reversed_sections-1.png'><img width='100%' src='http://psychicorigami.com/wp-content/uploads/2007/05/city100-reversed_sections-1.png' alt='city100-reversed_sections-1.png' /></a></center></p>
<p><br/><br />
Whereas the best for <code>swapped_cities</code> is clearly much worse:<br />
<center><a href='http://psychicorigami.com/wp-content/uploads/2007/05/city100-swapped_cities-9.png'><img width='100%' src='http://psychicorigami.com/wp-content/uploads/2007/05/city100-swapped_cities-9.png' alt='city100-swapped_cities-9.png' /></a></center></p>
<p>It&#8217;s pretty clear then that <code>reversed_sections</code> is the better move operator.  This is most likely due to it being less &#8220;destructive&#8221; than <code>swapped_cities</code>, as it preserves entire sections of a route, yet still affects the ordering of multiple cities in one go.</p>
<h3>conclusion</h3>
<p>As can be seen hill-climbing is a very simple algorithm that can produce <em>good</em> results &#8211; provided one uses the right move operator.</p>
<p>However it is not without it&#8217;s drawbacks and is prone to getting stuck at the top of &#8220;local maximums&#8221;.</p>
<p>Most of the other algorithms I will discuss later attempt to counter this weakness in hill-climbing.  The <a href='http://psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/'>next algorithm</a> I will discuss (simulated annealing) is actually a pretty simple modification of hill-climbing, but gives us a much better chance at finding the global maximum for a given solution landscape.</p>
<h3>source-code</h3>
<p>Full source-code is available <a href='http://www.littlespikeyland.com/tsp/tsp_part_two.tar.gz'>here</a> as a .tar.gz file. Again some unit tests  are included, which can be run using <a href='http://somethingaboutorange.com/mrl/projects/nose/'>nosetests</a>.</p>
<p><small>(The source-code contains more code than shown here, to handle parsing parameters passed to the program etc.  I&#8217;ve not discussed this extra code here simply to save space.)</small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Tackling the travelling salesman problem: introduction</title>
		<link>http://www.psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/</link>
		<comments>http://www.psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/#comments</comments>
		<pubDate>Tue, 17 Apr 2007 18:02:04 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TSP]]></category>

		<guid isPermaLink="false">http://www.psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/</guid>
		<description><![CDATA[This is the first part in my series on the &#8220;travelling salesman problem&#8221; (TSP). An outline of what I plan to cover can be seen in the prologue. To kick things off here&#8217;s a quick quote: The traveling salesman problem (TSP) asks for the shortest route to visit a collection of cities and return to [...]]]></description>
			<content:encoded><![CDATA[<p>This is the first part in my series on the &#8220;travelling salesman problem&#8221; (TSP).  An outline of what I plan to cover can be seen in the <a href='http://www.psychicorigami.com/2007/04/10/tackling-the-travelling-salesman-problem-prologue/'>prologue</a>.</p>
<p>To kick things off here&#8217;s a quick quote:</p>
<blockquote><p>The traveling salesman problem (TSP) asks for the shortest route to visit a collection of cities and return to the starting point. Despite an intensive study by mathematicians, computer scientists, operations researchers, and others, over the past 50 years, it remains an open question whether or not an efficient general solution method exists. <a href='#gatech'>[1]</a></p></blockquote>
<p>The TSP is an <a href='http://mathworld.wolfram.com/NP-HardProblem.html'>NP-Hard Problem</a>.  That does not necessarily mean any one instance of the problem will be hard to solve, it just means that we do not currently have an algorithm that can give us the guaranteed best solution for all problems in &#8220;polynomial time&#8221;.  We can&#8217;t make predictions about how long it might take to find the best solution to the TSP from just looking at the data.  We have no way of knowing how long a problem that is twice as large as one that took 2 minutes to solve will take.</p>
<p>Although we might not be able to make predications about finding the <em>best</em> solution, we often only want a <em>good</em> solution to the TSP.  We aren&#8217;t always so worried if we find a route amongst 1000 cities that is only a few miles longer than the best solution &#8211; particularly if it would take an inordinate amount of computing time to get from the good solution we already have to the best solution.</p>
<h3>stochastic optimisation</h3>
<p>As we often just want a good solution fairly quickly we can turn to <a href='http://en.wikipedia.org/wiki/Stochastic_optimization'>stochastic optimisation methods</a>.  We take randomly generated routes through the cities and incrementally improve/change them in some fashion to search for a better route.  How these changes are made depends on the algorithm being used, but there are a couple of simple approaches we can take, that I will outline here:</p>
<ul>
<li>swapping the position of two cities on a route</li>
<li>reversing the order of an entire section of a route</li>
</ul>
<p>These simple operators &#8211; so called as they &#8220;operate&#8221; on a route to create a new one &#8211; will form the initial basis of my code.  There are more complex operators, but for simplicity I shall not talk about them here.</p>
<h3>finally on with some code&#8230;.</h3>
<p>I will be using standard <a href='http://python.org/'>Python</a> lists to represent a route (or tour as I refer to it in my code &#8211; a name borrowed from graph theory) through a collection of cities.  Each city will simply be assigned a number from 0 to N-1 (where N is the number of cities) and therefore our list of cities will be a list of unique numbers between 0 and N-1.</p>
<p>We also need to specify a &#8220;distance matrix&#8221; that we can use to find out the distance between two cities.  To generate a distance matrix for a set of x,y co-ordinates the following will do nicely:<br />
<code></p>
<pre>
def cartesian_matrix(coords):
    '''create a distance matrix for the city coords
      that uses straight line distance'''
    matrix={}
    for i,(x1,y1) in enumerate(coords):
        for j,(x2,y2) in enumerate(coords):
            dx,dy=x1-x2,y1-y2
            dist=sqrt(dx*dx + dy*dy)
            matrix[i,j]=dist
    return matrix
</pre>
<p></code><br />
<code>cartesian_matrix()</code> takes a Python list of (x,y) tuples and outputs a Python dictionary that contains the distance between the distances between any two cities:<br />
<code></p>
<pre>
>>> matrix=cartesian_matrix([(0,0),(1,0),(1,1)])
>>> print matrix
{(0, 1): 1.0, (1, 2): 1.0, (0, 0): 0.0, (2, 1): 1.0,
(1, 1): 0.0, (2, 0): 1.4142135623730951, (2, 2): 0.0,
(1, 0): 1.0, (0, 2): 1.4142135623730951}
>>> print matrix[1,2]
1.0
</pre>
<p></code><br />
Where <code>matrix[1,2]</code> gives the distance between city number 1 and city number 2.  In our case this is the same as <code>matrix[2,1]</code>, but for some TSP&#8217;s it may not be (for example if there is a one way street between locations/cities that means we have to take a long way round when going in one direction).</p>
<p>In addition to generating the distance matrix we will probably also want to read the city co-ordinates from a text file (one x,y per line):<br />
<code></p>
<pre>def read_coords(coord_file):
    coords=[]
    for line in coord_file:
        x,y=line.strip().split(',')
        coords.append((float(x),float(y)))
    return coords
</pre>
<p></code></p>
<p>That should be sufficient for generating distance matrices for now.  On real world problems generating a distance matrix may be more involved &#8211; you might need to take map data and calculate what the actual distance by road between any two cities is.  This process can be done offline, before we start optimising our routes and is a subject for another time.</p>
<p>Ok, so now we can read in a list of cities from a file and generate our distance matrix.  What next?  Well it would be good if we knew how long a route was:<br />
<code></p>
<pre>
def tour_length(matrix,tour):
    total=0
    num_cities=len(tour)
    for i in range(num_cities):
        j=(i+1)%num_cities
        city_i=tour[i]
        city_j=tour[j]
        total+=matrix[city_i,city_j]
    return total
</pre>
<p></code><br />
Where <code>matrix</code> is a distance matrix and <code>tour</code> is a list of cities (as integers).</p>
<h3>implementing the operators</h3>
<p>I am going to implement the two operators as <a href='http://www.python.org/dev/peps/pep-0255/'>generator functions</a>, that will return (in a random order) all of the possible versions of a route that can be made in one step of the operator.  By using a generator function we can assess each different possibility and perhaps decide to not generate any more variations &#8211; which saves us the overhead of generating all of the combinations in one go.  Both operators rely on the following generator function:<br />
<code></p>
<pre>
def all_pairs(size,shuffle=random.shuffle):
    r1=range(size)
    r2=range(size)
    if shuffle:
        shuffle(r1)
        shuffle(r2)
    for i in r1:
        for j in r2:
            yield (i,j)
</pre>
<p></code><br />
Which will generate all pairings of the numbers from 0 to <code>size</code> as (i,j) tuples in a random order (needs the random module imported to work). </p>
<p>So each operator then looks like:</p>
<p><code></p>
<pre>
def swapped_cities(tour):
    '''generator to create all possible variations
      where two cities have been swapped'''
    for i,j in all_pairs(len(tour)):
        if i < j:
            copy=tour[:]
            copy[i],copy[j]=tour[j],tour[i]
            yield copy
</pre>
<p></code></p>
<p>and</p>
<p><code></p>
<pre>
def reversed_sections(tour):
    '''generator to return all possible variations
      where the section between two cities are swapped'''
    for i,j in all_pairs(len(tour)):
        if i != j:
            copy=tour[:]
            if i < j:
                copy[i:j+1]=reversed(tour[i:j+1])
            else:
                copy[i+1:]=reversed(tour[:j])
                copy[:j]=reversed(tour[i+1:])
            if copy != tour: # no point returning the same tour
                yield copy
</pre>
<p></code></p>
<p>Note I'm using <code>copy=tour[:]</code> to duplicate the route, rather than <code>copy=list(tour)</code> (or similar).  Although I am currently using a Python list to represent a tour I may in the future want to use something else that is merely &#8220;list-like&#8221;, in which case I could just override <code>__getitem__</code> to return an object of the appropriate type.</p>
<p>Using both operators looks like:<br />
<code></p>
<pre>
>>> for tour in swapped_cities([1,2,3]):
...     print tour
...
[3, 2, 1]
[2, 1, 3]
[1, 3, 2]
>>> for tour in reversed_sections([1,2,3]):
...     print tour
...
[3, 2, 1]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[1, 3, 2]
</pre>
<p></code><br />
As you can see <code>reversed_sections</code> gives us a lot more possibilities - even with only 3 cities.  I'll compare how these two operators stack up in the next part of the series.</p>
<h3>additional code</h3>
<p>The last piece of utility code for today generates a .png of a route overlaid on top of the city co-ordinates (using <a href='http://www.pythonware.com/products/pil/'>PIL</a>):<br />
<code></p>
<pre>
def write_tour_to_img(coords,tour,img_file):
    padding=20
    # shift all coords in a bit
    coords=[(x+padding,y+padding) for (x,y) in coords]
    maxx,maxy=0,0
    for x,y in coords:
        maxx=max(x,maxx)
        maxy=max(y,maxy)
    maxx+=padding
    maxy+=padding
    img=Image.new("RGB",(int(maxx),int(maxy)),color=(255,255,255))

    font=ImageFont.load_default()
    d=ImageDraw.Draw(img);
    num_cities=len(tour)
    for i in range(num_cities):
        j=(i+1)%num_cities
        city_i=tour[i]
        city_j=tour[j]
        x1,y1=coords[city_i]
        x2,y2=coords[city_j]
        d.line((int(x1),int(y1),int(x2),int(y2)),fill=(0,0,0))
        d.text((int(x1)+7,int(y1)-5),str(i),font=font,fill=(32,32,32))

    for x,y in coords:
        x,y=int(x),int(y)
        d.ellipse((x-5,y-5,x+5,y+5),outline=(0,0,0),fill=(196,196,196))
    del d
    img.save(img_file, "PNG")
</pre>
<p></code><br />
Which produces output along these lines:<br />
<center><br />
<img src='http://www.psychicorigami.com/wp-content/uploads/2007/04/test.png' alt='test.png' /><br />
</center></p>
<h3>conclusion</h3>
<p>Well that's it for part one.  At this point you can create a distance matrix, see how long a route is, create a few different variations of a route and create an image of a route.  The missing part is being able to generate a <em>good</em> route. I'll save that for the <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>next part</a>, where I'll discuss the simplest stochastic optimisation method: <a href='http://psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>Hill-climbing</a>.</p>
<h3>source-code</h3>
<p>Full source-code is available <a href='http://www.littlespikeyland.com/tsp/tsp_part_one.tar.gz'>here</a> as a .tar.gz file.  I've included some unit tests which can be run using <a href='http://somethingaboutorange.com/mrl/projects/nose/'>nosetests</a>.</p>
<h3>references</h3>
<ol>
<li><a name='gatech' href='http://www.tsp.gatech.edu/'>http://www.tsp.gatech.edu/</a></li>
<li><a href='http://mathworld.wolfram.com/NP-HardProblem.html'>http://mathworld.wolfram.com/NP-HardProblem.html</a></li>
<li><a href="http://en.wikipedia.org/wiki/Stochastic_optimization">http://en.wikipedia.org/wiki/Stochastic_optimization</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Tackling the travelling salesman problem: prologue</title>
		<link>http://www.psychicorigami.com/2007/04/10/tackling-the-travelling-salesman-problem-prologue/</link>
		<comments>http://www.psychicorigami.com/2007/04/10/tackling-the-travelling-salesman-problem-prologue/#comments</comments>
		<pubDate>Tue, 10 Apr 2007 19:35:33 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Optimisation]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[TSP]]></category>

		<guid isPermaLink="false">http://www.psychicorigami.com/2007/04/10/tackling-the-travelling-salesman-problem-prologue/</guid>
		<description><![CDATA[The Travelling Salesman Problem (TSP) is a classic combinatorial optimisation problem. It also happens to be a problem I have spent various parts of my life looking at. At my first job out of uni we worked on solving real-world problems that usually involved the TSP in some fashion. Often the problems we tackled were, [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href='http://en.wikipedia.org/wiki/Travelling_salesman_problem'>Travelling Salesman Problem</a> (TSP) is a classic <a href='http://en.wikipedia.org/wiki/Combinatorial_optimization'>combinatorial optimisation</a> problem.  It also happens to be a problem I have spent various parts of my life looking at.</p>
<p>At my first job out of uni we worked on solving real-world problems that usually involved the TSP in some fashion.  Often the problems we tackled were, in effect, the TSP with extra constraints or variants along those lines.  A straightforward example being having multiple &#8220;salesman&#8221; and trying to assign each one a route (that didn&#8217;t overlap).  The approach in that first job was mainly to use <a href='http://en.wikipedia.org/wiki/Evolutionary_algorithm'>evolutionary algorithms</a>.</p>
<p>Then during <a href='http://littlespikeyland.com/msc/'>my MSc</a> I took the course <a href='http://www.cs.bham.ac.uk/internal/modules/2004/12416.html'>Nature Inspired Optimisation</a>.  This was by far my favourite course during my MSc and saw us applying different optimisation algorithms to several different problems.  <a href='http://kennon.pleonast.com/'>Kennon</a> and myself were assigned the TSP to work on and had a lot of fun implementing various different algorithms to tackle it.</p>
<p>Anyway, I thought it would be interesting to give a brief guided tour of the different algorithms we used during the MSc to tackle the travelling salesman problem.  I&#8217;m no expert, but I do have a fair bit of experience with this problem and there are some fairly straightforward algorithms out there, which are good to have in your arsenal.</p>
<p>So my intention for this series is to cover (over the next weeks/months/however long):</p>
<ul>
<li><a href='http://www.psychicorigami.com/2007/04/17/tackling-the-travelling-salesman-problem-part-one/'>Defining the Travelling Salesman Problem</a> (and writing shared utility code)</li>
<li><a href='http://www.psychicorigami.com/2007/05/12/tackling-the-travelling-salesman-problem-hill-climbing/'>Hill-climbing</a> (the simplest approach)</li>
<li><a href="http://www.psychicorigami.com/2007/06/28/tackling-the-travelling-salesman-problem-simmulated-annealing/">Simulated annealing</a></li>
<li>Evolutionary algorithms</li>
</ul>
<p>That&#8217;s the bare minimum for now.  I may well cover a few more algorithms (e.g. ant algorithms), but those three algorithms provide quite a good starting point.</p>
<p>I will be writing everything from scratch, as I go, in <a href='http://python.org/'>Python</a>.  In general I&#8217;ll be aiming for clarity of code &#8211; rather than pure speed and Python should work well for that purpose.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychicorigami.com/2007/04/10/tackling-the-travelling-salesman-problem-prologue/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

