scala> :paste // Entering paste mode (ctrl-D to finish) trait Environment { val mode: String override val toString = "Running in mode '" + mode + "'" } // Exiting paste mode, now interpreting. defined trait Environment scala> val x = new Environment { override val mode = "production" } x: java.lang.Object with Environment = Running in mode 'null' //As you can see the mode is still null so instantiating Traits this way is a bad idea. //You can however use the construct below looking like an anonymous class definition scala> class Acceptance extends {val mode = "Acceptance"} with Environment defined class Acceptance scala> val y = new Acceptance y: Acceptance = Running in mode 'Acceptance' //or you can even skip constructing a class and just create a new Object directly scala> val dev = new {val mode = "Development"} with Environment dev: java.lang.Object with Environment = Running in mode 'Development'

# Exploration into Functional Programming

## Friday, June 15, 2012

### Using early member definitions with Traits

## Monday, June 4, 2012

### Micro Scala benchmark

I wanted to check how well Scala performs doing the same test as in Python on my laptop by zipping 2 identical vectors and calculating the sum. It didn't really surprise me that Scala outperformed Python but the outcome really gives me another WTF moment. I probably made a stupid mistake somewhere but just in case here is the test executed on scala. I did a few tests with smaller vectors and those gave the same results with Python and Scala. The strange outcome is due to the result not fitting into an integer !! Changing generic type [Int] to [Long] will result in ouf-of-memory exception unfortunately. For completeness a link to the simple Profiler used.

import Profiler._ object MicroBenchMark extends App { val v1 = 1 to 9999999 //scala includes .to(x) while Python excludes .to(x) val v2 = 1 to 9999999 def vector_op(vector1: Iterable[Int], vector2: Iterable[Int], op: (Int, Int) => Int): Iterable[Int] = { vector1.zip(vector2).map(op.tupled) } profile(println("sum is " + vector_op(v1, v2, (x,y) => x + y).sum)) //python prints 99999990000000 instead of 266447232 }

sum is 266447232 Execution took 8 seconds

## Friday, June 1, 2012

### Python: Creating dictionary of adjacent elements of circular iterable

In many problems you will need to deal with circular datastructures which are most of the time presented in a simple iterable. If you need to quickly find for a specific element what the left and right adjacent elements are, you can use this generic function presented below.

We can even make it more generic if we want to specify what keys and values to use from the iterables.

#we got a list of persons sitting next to each other in a circle. #This means Robby is sitting next to Alice (left) and Valerie (right) for below example. #We want to create a quick lookup dict for who sits next to each other def getCircularLookupDictionary(iter): #generic function which returns a dict of items with format (key, (leftadjacent, rightadjacent)) l = len(iter) """ return dict( (iter[i], (iter[l - 1], iter[i+1]) ) if i == 0 else (iter[i], (iter[i - 1], iter[0]) ) if i == l -1 else (iter[i], (iter[i - 1], iter[i+1]) ) for i in range(l) ) """ #nice oneliner provided by my colleague Ivan Lagunov !! return dict((iter[i], (iter[i - 1], iter[i + 1 - l])) for i in range(l)) circle = ['Robby', 'Valerie', 'Lindsey', 'Audrey', 'Alice'] neighbours = getCircularLookupDictionary(circle) print neighbours['Robby'] print neighbours['Lindsey'] print neighbours['Alice'] assert neighbours['Robby'] == ('Alice', 'Valerie') assert neighbours['Lindsey'] == ('Valerie', 'Audrey') assert neighbours['Alice'] == ('Audrey', 'Robby')

('Alice', 'Valerie') ('Valerie', 'Audrey') ('Audrey', 'Robby')

We can even make it more generic if we want to specify what keys and values to use from the iterables.

#we got a list of persons sitting next to each other in a circle. #This means Robby is sitting next to Alice (left) and Valerie (right) for below example. #We want to create a quick lookup dict for who sits next to each other def getCircularLookupDictionary(iter, keyf, valuef): #generic function which returns a dict of items with format (key, (leftadjacent, rightadjacent)) l = len(iter) """ return dict( (keyf(iter[i]), (valuef(iter[l - 1]), valuef(iter[i+1])) ) if i == 0 else (keyf(iter[i]), (valuef(iter[i - 1]), valuef(iter[0])) ) if i == l -1 else (keyf(iter[i]), (valuef(iter[i - 1]), valuef(iter[i+1])) ) for i in range(l) ) """ #nice oneliner provided by my colleague Ivan Lagunov return dict((keyf(iter[i]), (valuef(iter[i - 1]), valuef(iter[i + 1 -l])) ) for i in range(l)) #circle contains tuples of persons (name, age) circle = [('Robby', 35), ('Valerie', 5), ('Lindsey', 9), ('Audrey', 40), ('Alice', 59)] neighbours = getCircularLookupDictionary(circle, lambda (name,age): name, lambda (name,age): age) print neighbours['Robby'] print neighbours['Lindsey'] print neighbours['Alice'] assert neighbours['Robby'] == (59, 5) #ages of Alice, Valerie assert neighbours['Lindsey'] == (5, 40) #ages of Valerie, Audrey assert neighbours['Alice'] == (40, 35) #ages of Audrey, Robby

(59, 5) (5, 40) (40, 35)

### Python: Using attributes with functions

Python offers the possibility to store attributes on a function. This can be very useful in a number of situations. Below an example that shows usage.

import inspect def viewCount(page): #returns a function which keeps track of how many times the page got viewed def f(): f.numberOfViews += 1 return f.numberOfViews f.__name__ = "ViewCount for page " + page f.numberOfViews = 0 return f viewCountOne = viewCount('index.html') viewCountTwo = viewCount('faq.html') print viewCountOne() print viewCountOne() print viewCountTwo() print viewCountOne() print viewCountTwo() print viewCountOne.__name__ print viewCountTwo.__name__ print hasattr(viewCountOne, 'numberOfViews') print inspect.isfunction(viewCountOne) print viewCountOne.__name__.find('index') != -1

1 2 1 3 2 ViewCount for page index.html ViewCount for page faq.html True True True

## Thursday, May 31, 2012

### Python performance pitfalls (?) or issues (?)

I just want to warn you that Python has its glitches performance wise. Functional programming buzzwords are immutability and lazy evaluation (reduced memory footprint). But Python is doing something wrong here one would say. Check for yourself and I would love Python gurus to explain why my attempts to go with the lazy flow resulted in killing the performance.

Let's not return an array but a generator expression.

What is really surprising is that it even takes more time now. So what are we doing still wrong?

Let's also use xrange which is lazy evaluated and see how far that gets us.

Will izip do us anygood in this case?

What about not using a lambda expression?

Ok... last hope?! Let's inline the addition instead of calling operator function.

That seemed to help a lot, so beware of using too much abstraction when performance really matters. Not making the extra number of function calls reduces the execution time a lot. So now just for fun let us not return a generator expression but an array again.

What the f@ck... The generator expression in combination with the sum function seems like total killing performance. Storing the result from the zip call in an array and next calling sum boosted performance.

This answer from stackoverflow is very useful to understand the outcome of this test. I conclude this article that it always makes good sense to profile your app and decide for yourself what is the best approach.

import cProfile def vector_op(v1,v2,op): return [op(x,y) for x,y in zip(v1,v2)] v1 = range(1, 10000000) v2 = range(1, 10000000) cProfile.run('print sum(vector_op(v1,v2, lambda x,y: x + y))')

99999990000000 10000046 function calls (10000040 primitive calls) in 59.064 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 25.622 0.000 25.622 0.000:1( ) 1 31.126 31.126 58.165 58.165 PythonApplication1.py:21(vector_op) 1 0.837 0.837 0.837 0.837 {sum} 1 1.417 1.417 1.417 1.417 {zip}

Let's not return an array but a generator expression.

import cProfile def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in zip(v1,v2)) v1 = range(1, 10000000) v2 = range(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

What is really surprising is that it even takes more time now. So what are we doing still wrong?

99999990000000 20000046 function calls (20000040 primitive calls) in 91.824 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 9999999 26.132 0.000 26.132 0.000:1( ) 10000000 46.346 0.000 72.478 0.000 PythonApplication1.py:25( ) 1 17.966 17.966 90.444 90.444 {sum} 1 1.378 1.378 1.378 1.378 {zip}

Let's also use xrange which is lazy evaluated and see how far that gets us.

import cProfile def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in zip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

99999990000000 20000046 function calls (20000040 primitive calls) in 96.393 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 27.329 0.000 27.329 0.000:1( ) 10000000 48.976 0.000 76.305 0.000 PythonApplication1.py:27( ) 1 19.059 19.059 95.364 95.364 {sum} 1 1.028 1.028 1.028 1.028 {zip}

Will izip do us anygood in this case?

import cProfile from itertools import izip def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in izip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

99999990000000 20000045 function calls (20000039 primitive calls) in 91.346 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 26.304 0.000 26.304 0.000:1( ) 10000000 46.759 0.000 73.063 0.000 PythonApplication1.py:27( ) 1 18.280 18.280 91.343 91.343 {sum}

What about not using a lambda expression?

import cProfile from itertools import izip def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in izip(v1,v2)) def op_add(x,y): return x + y v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, op_add))')

99999990000000 20000045 function calls (20000039 primitive calls) in 89.603 seconds ncalls tottime percall cumtime percall filename:lineno(function) 10000000 46.009 0.000 71.799 0.000 PythonApplication1.py:27() 9999999 25.790 0.000 25.790 0.000 PythonApplication1.py:29(op_add) 1 17.804 17.804 89.603 89.603 {sum}

Ok... last hope?! Let's inline the addition instead of calling operator function.

import cProfile from itertools import izip def vector_op_lazy(v1,v2): return (x + y for x,y in izip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2))')

99999990000000 10000046 function calls (10000040 primitive calls) in 45.078 seconds ncalls tottime percall cumtime percall filename:lineno(function) 10000000 27.925 0.000 27.925 0.000 PythonApplication1.py:27() 1 17.153 17.153 45.077 45.077 {sum}

That seemed to help a lot, so beware of using too much abstraction when performance really matters. Not making the extra number of function calls reduces the execution time a lot. So now just for fun let us not return a generator expression but an array again.

import cProfile from itertools import izip def vector_op(v1,v2): return [x + y for x,y in zip(v1,v2)] v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op(v1,v2))')

What the f@ck... The generator expression in combination with the sum function seems like total killing performance. Storing the result from the zip call in an array and next calling sum boosted performance.

99999990000000 47 function calls (41 primitive calls) in 14.603 seconds ncalls tottime percall cumtime percall filename:lineno(function) 1 12.678 12.678 13.733 13.733 PythonApplication1.py:25(vector_op) 1 0.810 0.810 0.810 0.810 {sum} 1 1.056 1.056 1.056 1.056 {zip}

This answer from stackoverflow is very useful to understand the outcome of this test. I conclude this article that it always makes good sense to profile your app and decide for yourself what is the best approach.

## Monday, May 28, 2012

### final exam CS212: Portmanteau

This is the sixth and last question from the Udacity CS212 final exam. I will update this article with my solution in 1 week. Please do not comment anything during the final exam week !!

My solution:

# Unit 6: Fun with Words """ A portmanteau word is a blend of two or more words, like 'mathelete', which comes from 'math' and 'athelete'. You will write a function to find the 'best' portmanteau word from a list of dictionary words. Because 'portmanteau' is so easy to misspell, we will call our function 'natalie' instead: natalie(['word', ...]) == 'portmanteauword' In this exercise the rules are: a portmanteau must be composed of three non-empty pieces, start+mid+end, where both start+mid and mid+end are among the list of words passed in. For example, 'adolescented' comes from 'adolescent' and 'scented', with start+mid+end='adole'+'scent'+'ed'. A portmanteau must be composed of two different words (not the same word twice). That defines an allowable combination, but which is best? Intuitively, a longer word is better, and a word is well-balanced if the mid is about half the total length while start and end are about 1/4 each. To make that specific, the score for a word w is the number of letters in w minus the difference between the actual and ideal lengths of start, mid, and end. (For the example word w='adole'+'scent'+'ed', the start,mid,end lengths are 5,5,2 and the total length is 12. The ideal start,mid,end lengths are 12/4,12/2,12/4 = 3,6,3. So the final score is 12 - abs(5-3) - abs(5-6) - abs(2-3) = 8. yielding a score of 12 - abs(5-(12/4)) - abs(5-(12/2)) - abs(2-(12/4)) = 8. The output of natalie(words) should be the best portmanteau, or None if there is none. Note (1): I got the idea for this question from Darius Bacon. Note (2): In real life, many portmanteaux omit letters, for example 'smoke' + 'fog' = 'smog'; we aren't considering those. Note (3): The word 'portmanteau' is itself a portmanteau; it comes from the French "porter" (to carry) + "manteau" (cloak), and in English meant "suitcase" in 1871 when Lewis Carroll used it in 'Through the Looking Glass' to mean two words packed into one. Note (4): the rules for 'best' are certainly subjective, and certainly should depend on more things than just letter length. In addition to programming the solution described here, you are welcome to explore your own definition of best, and use your own word lists to come up with interesting new results. Post your best ones in the discussion forum. Note (5) The test examples will involve no more than a dozen or so input words. But you could implement a method that is efficient with a larger list of words. """ def natalie(words): "Find the best Portmanteau word formed from any two of the list of words." def test_natalie(): "Some test cases for natalie" assert natalie(['adolescent', 'scented', 'centennial', 'always', 'ado']) == 'adolescented' assert natalie(['eskimo', 'escort', 'kimchee', 'kimono', 'cheese']) == 'eskimono' assert natalie(['kimono', 'kimchee', 'cheese', 'serious', 'us', 'usage']) == 'kimcheese' assert natalie(['circus', 'elephant', 'lion', 'opera', 'phantom']) == 'elephantom' assert natalie(['programmer', 'coder', 'partying', 'merrymaking']) == 'programmerrymaking' assert natalie(['int', 'intimate', 'hinter', 'hint', 'winter']) == 'hintimate' assert natalie(['morass', 'moral', 'assassination']) == 'morassassination' assert natalie(['entrepreneur', 'academic', 'doctor', 'neuropsychologist', 'neurotoxin', 'scientist', 'gist']) == 'entrepreneuropsychologist' assert natalie(['perspicacity', 'cityslicker', 'capability', 'capable']) == 'perspicacityslicker' assert natalie(['backfire', 'fireproof', 'backflow', 'flowchart', 'background', 'groundhog']) == 'backgroundhog' assert natalie(['streaker', 'nudist', 'hippie', 'protestor', 'disturbance', 'cops']) == 'nudisturbance' assert natalie(['night', 'day']) == None assert natalie(['dog', 'dogs']) == None assert natalie(['test']) == None assert natalie(['']) == None assert natalie(['ABC', '123']) == None assert natalie([]) == None return 'tests pass' print test_natalie()

My solution:

# Unit 6: Fun with Words """ A portmanteau word is a blend of two or more words, like 'mathelete', which comes from 'math' and 'athelete'. You will write a function to find the 'best' portmanteau word from a list of dictionary words. Because 'portmanteau' is so easy to misspell, we will call our function 'natalie' instead: natalie(['word', ...]) == 'portmanteauword' In this exercise the rules are: a portmanteau must be composed of three non-empty pieces, start+mid+end, where both start+mid and mid+end are among the list of words passed in. For example, 'adolescented' comes from 'adolescent' and 'scented', with start+mid+end='adole'+'scent'+'ed'. A portmanteau must be composed of two different words (not the same word twice). That defines an allowable combination, but which is best? Intuitively, a longer word is better, and a word is well-balanced if the mid is about half the total length while start and end are about 1/4 each. To make that specific, the score for a word w is the number of letters in w minus the difference between the actual and ideal lengths of start, mid, and end. (For the example word w='adole'+'scent'+'ed', the start,mid,end lengths are 5,5,2 and the total length is 12. The ideal start,mid,end lengths are 12/4,12/2,12/4 = 3,6,3. So the final score is 12 - abs(5-3) - abs(5-6) - abs(2-3) = 8. yielding a score of 12 - abs(5-(12/4)) - abs(5-(12/2)) - abs(2-(12/4)) = 8. The output of natalie(words) should be the best portmanteau, or None if there is none. Note (1): I got the idea for this question from Darius Bacon. Note (2): In real life, many portmanteaux omit letters, for example 'smoke' + 'fog' = 'smog'; we aren't considering those. Note (3): The word 'portmanteau' is itself a portmanteau; it comes from the French "porter" (to carry) + "manteau" (cloak), and in English meant "suitcase" in 1871 when Lewis Carroll used it in 'Through the Looking Glass' to mean two words packed into one. Note (4): the rules for 'best' are certainly subjective, and certainly should depend on more things than just letter length. In addition to programming the solution described here, you are welcome to explore your own definition of best, and use your own word lists to come up with interesting new results. Post your best ones in the discussion forum. Note (5) The test examples will involve no more than a dozen or so input words. But you could implement a method that is efficient with a larger list of words. """ def natalie(words): "Find the best Portmanteau word formed from any two of the list of words." #we will represent a natalie with a tuple (word, score) natalies = [] for (word1, word2) in getWordPairs(words): mid = '' endpos = 1 while word1.find(word2[0:endpos]) != -1 and endpos <= len(word2): mid = word2[0:endpos] endpos +=1 start = substringBefore(word1, mid) end = substringAfter(word2, mid) if mid == '' or len(start) == 0 or len(end) == 0: #we didn't find a natalie continue word = start + mid + end score = len(word) - abs(len(start)-(len(word)/4.0)) - abs(len(mid)-(len(word)/2.0)) - abs(len(end)-(len(word)/4.0)) natalies.append((word, score)) if len(natalies) == 0: return None max_score = max([natalie[1] for natalie in natalies]) #remark: I only return the first hit among several best matches return [ele[0] for ele in natalies if ele[1] == max_score][0] def substringAfter(text, substring): start = text.find(substring) if start == -1: return '' return text[start + len(substring):] def substringBefore(text, substring): start = text.find(substring) if start == -1: return '' return text[:start] def getWordPairs(words): from itertools import permutations perms = list(permutations(words, 2)) return perms def test_natalie(): "Some test cases for natalie" assert natalie(['adolescent', 'scented', 'centennial', 'always', 'ado']) == 'adolescented' assert natalie(['eskimo', 'escort', 'kimchee', 'kimono', 'cheese']) == 'eskimono' assert natalie(['kimono', 'kimchee', 'cheese', 'serious', 'us', 'usage']) == 'kimcheese' assert natalie(['circus', 'elephant', 'lion', 'opera', 'phantom']) == 'elephantom' assert natalie(['programmer', 'coder', 'partying', 'merrymaking']) == 'programmerrymaking' assert natalie(['int', 'intimate', 'hinter', 'hint', 'winter']) == 'hintimate' assert natalie(['morass', 'moral', 'assassination']) == 'morassassination' assert natalie(['entrepreneur', 'academic', 'doctor', 'neuropsychologist', 'neurotoxin', 'scientist', 'gist']) == 'entrepreneuropsychologist' assert natalie(['perspicacity', 'cityslicker', 'capability', 'capable']) == 'perspicacityslicker' assert natalie(['backfire', 'fireproof', 'backflow', 'flowchart', 'background', 'groundhog']) == 'backgroundhog' assert natalie(['streaker', 'nudist', 'hippie', 'protestor', 'disturbance', 'cops']) == 'nudisturbance' assert natalie(['night', 'day']) == None assert natalie(['dog', 'dogs']) == None assert natalie(['test']) == None assert natalie(['']) == None assert natalie(['ABC', '123']) == None assert natalie([]) == None return 'tests pass'

### final exam CS212: Darts probability

This is the fifth question from the Udacity CS212 final exam. I will update this article with my solution in 1 week. Please do not comment anything during the final exam week !!

My solution:

# Unit 5: Probability in the game of Darts """ In the game of darts, players throw darts at a board to score points. The circular board has a 'bulls-eye' in the center and 20 slices called sections, numbered 1 to 20, radiating out from the bulls-eye. The board is also divided into concentric rings. The bulls-eye has two rings: an outer 'single' ring and an inner 'double' ring. Each section is divided into 4 rings: starting at the center we have a thick single ring, a thin triple ring, another thick single ring, and a thin double ring. A ring/section combination is called a 'target'; they have names like 'S20', 'D20' and 'T20' for single, double, and triple 20, respectively; these score 20, 40, and 60 points. The bulls-eyes are named 'SB' and 'DB', worth 25 and 50 points respectively. Illustration (png image): http://goo.gl/i7XJ9 There are several variants of darts play; in the game called '501', each player throws three darts per turn, adding up points until they total exactly 501. However, the final dart must be in a double ring. Your first task is to write the function double_out(total), which will output a list of 1 to 3 darts that add up to total, with the restriction that the final dart is a double. See test_darts() for examples. Return None if there is no list that achieves the total. Often there are several ways to achieve a total. You must return a shortest possible list, but you have your choice of which one. For example, for total=100, you can choose ['T20', 'D20'] or ['DB', 'DB'] but you cannot choose ['T20', 'D10', 'D10']. """ def test_darts(): "Test the double_out function." assert double_out(170) == ['T20', 'T20', 'DB'] assert double_out(171) == None assert double_out(100) in (['T20', 'D20'], ['DB', 'DB']) """ My strategy: I decided to choose the result that has the highest valued target(s) first, e.g. always take T20 on the first dart if we can achieve a solution that way. If not, try T19 first, and so on. At first I thought I would need three passes: first try to solve with one dart, then with two, then with three. But I realized that if we include 0 as a possible dart value, and always try the 0 first, then we get the effect of having three passes, but we only have to code one pass. So I creted ordered_points as a list of all possible scores that a single dart can achieve, with 0 first, and then descending: [0, 60, 57, ..., 1]. I iterate dart1 and dart2 over that; then dart3 must be whatever is left over to add up to total. If dart3 is a valid element of points, then we have a solution. But the solution, is a list of numbers, like [0, 60, 40]; we need to transform that into a list of target names, like ['T20', 'D20'], we do that by defining name(d) to get the name of a target that scores d. When there are several choices, we must choose a double for the last dart, but for the others I prefer the easiest targets first: 'S' is easiest, then 'T', then 'D'. """ def double_out(total): """Return a shortest possible list of targets that add to total, where the length <= 3 and the final element is a double. If there is no solution, return None.""" # your code here """ It is easy enough to say "170 points? Easy! Just hit T20, T20, DB." But, at least for me, it is much harder to actually execute the plan and hit each target. In this second half of the question, we investigate what happens if the dart-thrower is not 100% accurate. We will use a wrong (but still useful) model of inaccuracy. A player has a single number from 0 to 1 that characterizes his/her miss rate. If miss=0.0, that means the player hits the target every time. But if miss is, say, 0.1, then the player misses the section s/he is aiming at 10% of the time, and also (independently) misses the thin double or triple ring 10% of the time. Where do the misses go? Here's the model: First, for ring accuracy. If you aim for the triple ring, all the misses go to a single ring (some to the inner one, some to the outer one, but the model doesn't distinguish between these). If you aim for the double ring (at the edge of the board), half the misses (e.g. 0.05 if miss=0.1) go to the single ring, and half off the board. (We will agree to call the off-the-board 'target' by the name 'OFF'.) If you aim for a thick single ring, it is about 5 times thicker than the thin rings, so your miss ratio is reduced to 1/5th, and of these, half go to the double ring and half to the triple. So with miss=0.1, 0.01 will go to each of the double and triple ring. Finally, for the bulls-eyes. If you aim for the single bull, 1/4 of your misses go to the double bull and 3/4 to the single ring. If you aim for the double bull, it is tiny, so your miss rate is tripled; of that, 2/3 goes to the single ring and 1/3 to the single bull ring. Now, for section accuracy. Half your miss rate goes one section clockwise and half one section counter-clockwise from your target. The clockwise order of sections is: 20 1 18 4 13 6 10 15 2 17 3 19 7 16 8 11 14 9 12 5 If you aim for the bull (single or double) and miss on rings, then the section you end up on is equally possible among all 20 sections. But independent of that you can also miss on sections; again such a miss is equally likely to go to any section and should be recorded as being in the single ring. You will need to build a model for these probabilities, and define the function outcome(target, miss), which takes a target (like 'T20') and a miss ration (like 0.1) and returns a dict of {target: probability} pairs indicating the possible outcomes. You will also define best_target(miss) which, for a given miss ratio, returns the target with the highest expected score. If you are very ambitious, you can try to find the optimal strategy for accuracy-limited darts: given a state defined by your total score needed and the number of darts remaining in your 3-dart turn, return the target that minimizes the expected number of total 3-dart turns (not the number of darts) required to reach the total. This is harder than Pig for several reasons: there are many outcomes, so the search space is large; also, it is always possible to miss a double, and thus there is no guarantee that the game will end in a finite number of moves. """ def outcome(target, miss): "Return a probability distribution of [(target, probability)] pairs." #your code here def best_target(miss): "Return the target that maximizes the expected score." #your code here def same_outcome(dict1, dict2): "Two states are the same if all corresponding sets of locs are the same." return all(abs(dict1.get(key, 0) - dict2.get(key, 0)) <= 0.0001 for key in set(dict1) | set(dict2)) def test_darts2(): assert best_target(0.0) == 'T20' assert best_target(0.1) == 'T20' assert best_target(0.4) == 'T19' assert same_outcome(outcome('T20', 0.0), {'T20': 1.0}) assert same_outcome(outcome('T20', 0.1), {'T20': 0.81, 'S1': 0.005, 'T5': 0.045, 'S5': 0.005, 'T1': 0.045, 'S20': 0.09}) assert same_outcome( outcome('SB', 0.2), {'S9': 0.01, 'S8': 0.01, 'S3': 0.01, 'S2': 0.01, 'S1': 0.01, 'DB': 0.04, 'S6': 0.01, 'S5': 0.01, 'S4': 0.01, 'S19': 0.01, 'S18': 0.01, 'S13': 0.01, 'S12': 0.01, 'S11': 0.01, 'S10': 0.01, 'S17': 0.01, 'S16': 0.01, 'S15': 0.01, 'S14': 0.01, 'S7': 0.01, 'S20': 0.01, 'SB': 0.76})

My solution:

# Unit 5: Probability in the game of Darts """ In the game of darts, players throw darts at a board to score points. The circular board has a 'bulls-eye' in the center and 20 slices called sections, numbered 1 to 20, radiating out from the bulls-eye. The board is also divided into concentric rings. The bulls-eye has two rings: an outer 'single' ring and an inner 'double' ring. Each section is divided into 4 rings: starting at the center we have a thick single ring, a thin triple ring, another thick single ring, and a thin double ring. A ring/section combination is called a 'target'; they have names like 'S20', 'D20' and 'T20' for single, double, and triple 20, respectively; these score 20, 40, and 60 points. The bulls-eyes are named 'SB' and 'DB', worth 25 and 50 points respectively. Illustration (png image): http://goo.gl/i7XJ9 There are several variants of darts play; in the game called '501', each player throws three darts per turn, adding up points until they total exactly 501. However, the final dart must be in a double ring. Your first task is to write the function double_out(total), which will output a list of 1 to 3 darts that add up to total, with the restriction that the final dart is a double. See test_darts() for examples. Return None if there is no list that achieves the total. Often there are several ways to achieve a total. You must return a shortest possible list, but you have your choice of which one. For example, for total=100, you can choose ['T20', 'D20'] or ['DB', 'DB'] but you cannot choose ['T20', 'D10', 'D10']. """ def test_darts(): "Test the double_out function." assert double_out(170) == ['T20', 'T20', 'DB'] assert double_out(171) == None assert double_out(100) in (['T20', 'D20'], ['DB', 'DB']) """ My strategy: I decided to choose the result that has the highest valued target(s) first, e.g. always take T20 on the first dart if we can achieve a solution that way. If not, try T19 first, and so on. At first I thought I would need three passes: first try to solve with one dart, then with two, then with three. But I realized that if we include 0 as a possible dart value, and always try the 0 first, then we get the effect of having three passes, but we only have to code one pass. So I creted ordered_points as a list of all possible scores that a single dart can achieve, with 0 first, and then descending: [0, 60, 57, ..., 1]. I iterate dart1 and dart2 over that; then dart3 must be whatever is left over to add up to total. If dart3 is a valid element of points, then we have a solution. But the solution, is a list of numbers, like [0, 60, 40]; we need to transform that into a list of target names, like ['T20', 'D20'], we do that by defining name(d) to get the name of a target that scores d. When there are several choices, we must choose a double for the last dart, but for the others I prefer the easiest targets first: 'S' is easiest, then 'T', then 'D'. """ def double_out(total): """Return a shortest possible list of targets that add to total, where the length <= 3 and the final element is a double. If there is no solution, return None.""" from itertools import ifilter targets = getTargets() #first we try to finish with a single shot (double out) for target in targets: if target[0] == total and target[2] == 'D': return [target[1]] #now we try to finish with two shots of which the last must be a double out for target1 in targets: for target2 in ifilter(lambda target: target[2] == 'D', targets): if target1[0] + target2[0] == total: return [target1[1], target2[1]] #so it seems we are bad at darts and are going to need 3 throws for target1 in targets: if target1[0] < total: for target2 in targets: if target1[0] + target2[0] < total: for target3 in ifilter(lambda target: target[2] == 'D', targets): if target1[0] + target2[0] + target3[0] == total: return [target1[1], target2[1], target3[1]] return None def getTargets(): #returns a sorted (high to low) array of tuples of all possible targets in form #[(60, 'T20', 'T'), (57, 'T19', 'T'), ...,(50, 'DB', 'D'), ...,(19, 'S19', 'S'),...] targets = [(3*i, 'T' + str(i), 'T') for i in range(1,21)] + [(2*i, 'D' + str(i), 'D') for i in range(1,21)] + [(i, 'S' + str(i), 'S') for i in range(1,21)] + [(25, 'SB', 'S'), (50, 'DB', 'D')] targets.sort() targets.reverse() return targets """ def getTarget(targets, restvalue): #return best target for remaining points to throw or None if we can't double out for target in targets: if target[0] < restvalue: return target if target[0] == restvalue and target[2] == 'D': return target return None """ """ It is easy enough to say "170 points? Easy! Just hit T20, T20, DB." But, at least for me, it is much harder to actually execute the plan and hit each target. In this second half of the question, we investigate what happens if the dart-thrower is not 100% accurate. We will use a wrong (but still useful) model of inaccuracy. A player has a single number from 0 to 1 that characterizes his/her miss rate. If miss=0.0, that means the player hits the target every time. But if miss is, say, 0.1, then the player misses the section s/he is aiming at 10% of the time, and also (independently) misses the thin double or triple ring 10% of the time. Where do the misses go? Here's the model: First, for ring accuracy. If you aim for the triple ring, all the misses go to a single ring (some to the inner one, some to the outer one, but the model doesn't distinguish between these). If you aim for the double ring (at the edge of the board), half the misses (e.g. 0.05 if miss=0.1) go to the single ring, and half off the board. (We will agree to call the off-the-board 'target' by the name 'OFF'.) If you aim for a thick single ring, it is about 5 times thicker than the thin rings, so your miss ratio is reduced to 1/5th, and of these, half go to the double ring and half to the triple. So with miss=0.1, 0.01 will go to each of the double and triple ring. Finally, for the bulls-eyes. If you aim for the single bull, 1/4 of your misses go to the double bull and 3/4 to the single ring. If you aim for the double bull, it is tiny, so your miss rate is tripled; of that, 2/3 goes to the single ring and 1/3 to the single bull ring. Now, for section accuracy. Half your miss rate goes one section clockwise and half one section counter-clockwise from your target. The clockwise order of sections is: 20 1 18 4 13 6 10 15 2 17 3 19 7 16 8 11 14 9 12 5 If you aim for the bull (single or double) and miss on rings, then the section you end up on is equally possible among all 20 sections. But independent of that you can also miss on sections; again such a miss is equally likely to go to any section and should be recorded as being in the single ring. You will need to build a model for these probabilities, and define the function outcome(target, miss), which takes a target (like 'T20') and a miss ration (like 0.1) and returns a dict of {target: probability} pairs indicating the possible outcomes. You will also define best_target(miss) which, for a given miss ratio, returns the target with the highest expected score. If you are very ambitious, you can try to find the optimal strategy for accuracy-limited darts: given a state defined by your total score needed and the number of darts remaining in your 3-dart turn, return the target that minimizes the expected number of total 3-dart turns (not the number of darts) required to reach the total. This is harder than Pig for several reasons: there are many outcomes, so the search space is large; also, it is always possible to miss a double, and thus there is no guarantee that the game will end in a finite number of moves. """ def outcome(target, miss): "Return a probability distribution of [(target, probability)] pairs." from itertools import product ring = ring_outcome(target,miss) section = section_outcome(target, miss) if target in ['SB', 'DB']: result = {'SB': ring['SB'] * section['B'] + ring['S'] * section['B'], 'DB': ring['DB'] * section['B']} for i in range(1,21): result['S' + str(i)] = section[str(i)] return result else: return dict((combi[0][0] + combi[1][0], combi[0][1] * combi[1][1]) for combi in product(ring.items(), section.items())) def ring_outcome(target, miss): #target = 'T20' | 'SB' | 'D5' | ... #returns dictionary of ring outcomes if target[0:2] == 'SB': #single bull return {'SB': 1 - miss, 'DB': miss * 0.25, 'S': miss * 0.75} elif target[0:2] == 'DB': #double bull return {'DB': 1 - miss * 3, 'SB': miss * 3 * 1./3, 'S': miss * 3 * 2./3} elif target[0] == 'T': #triple ring return {'T': 1 - miss, 'S': miss} elif target[0] == 'D': #double ring return {'D': 1- miss, 'S': miss * 0.5} #OFF = miss * 0.5 elif target[0] == 'S': #single ring return {'S': 1 - (miss / 5), 'D': (miss / 5) * 0.5, 'T': (miss / 5) * 0.5} else: #should not occur return {} def section_outcome(target, miss): #target = 'T20' | 'SB' | 'D5' | ... #returns dictionary of section outcomes clockwise = [20, 1, 18, 4, 13, 6, 10, 15, 2, 17, 3, 19, 7, 16, 8, 11, 14, 9, 12, 5] misstargets = dict((str(clockwise[i]), (str(5),str(1))) if i == 1 else (str(clockwise[i]), (str(12),str(20))) if i == 19 else (str(clockwise[i]), (str(clockwise[i - 1]),str(clockwise[i + 1]))) for i in range(20)) if target[0:2] in ['SB', 'DB']: d = dict((str(i), miss / 20) for i in range(1,21)) d['B'] = 1 - miss return d else: section = target[1:] leftmiss = misstargets[section][0] rightmiss = misstargets[section][1] return {section: 1 - miss, leftmiss: miss * 0.5, rightmiss: miss * 0.5} def best_target(miss): "Return the target that maximizes the expected score." #[(60, 'T20', 'T'), (57, 'T19', 'T'), ...,(50, 'DB', 'D'), ...,(19, 'S19', 'S'),...] targets = getTargets() #first we create a dictionary (target: value) e.g. ('T20': 60) valueDict = dict((target[1], target[0]) for target in targets) return max(getAverageScore(target[1], miss, valueDict) for target in targets)[1] def getAverageScore(target, miss, valueDict): #returns a tuple of format (score, target) e.g. (17.5,'T20') so we can use max function easily target_outcome = outcome(target, miss) score = 0 for entry in target_outcome.items(): miss_target = entry[0] probability = entry[1] score += probability * valueDict[miss_target] return (score, target) def same_outcome(dict1, dict2): "Two states are the same if all corresponding sets of locs are the same." return all(abs(dict1.get(key, 0) - dict2.get(key, 0)) <= 0.0001 for key in set(dict1) | set(dict2)) def test_darts2(): assert best_target(0.0) == 'T20' assert best_target(0.1) == 'T20' assert best_target(0.4) == 'T19' assert same_outcome(outcome('T20', 0.0), {'T20': 1.0}) assert same_outcome(outcome('T20', 0.1), {'T20': 0.81, 'S1': 0.005, 'T5': 0.045, 'S5': 0.005, 'T1': 0.045, 'S20': 0.09})

Subscribe to:
Posts (Atom)