# Inheritance

This notebook demonstrates the use of inheritance to extend Python’s Counter class to implement Multisets, PMFs, and suites of Bayesian hypotheses.

from __future__ import print_function, division

from collections import Counter
import numpy as np


### Counter

A Counter is a map from values to their frequencies. If you initialize a Counter with a string, you get a map from each letter to the number of times it appears. If two words are anagrams, they yield equal Counters, so you can use Counters to test anagrams.

def is_anagram(word1, word2):
"""Checks whether the words are anagrams.

word1: string
word2: string

returns: boolean
"""
return Counter(word1) == Counter(word2)

is_anagram('tachymetric', 'mccarthyite')

is_anagram('banana', 'peach')


Exercise: The Counter class inherits from dict so all methods and functions that work with a dictionary will also work with a Counter.

Read the documentation of Counter, then use a Counter to find the three most common letters in the word “pneumonoultramicroscopicsilicovolcanoconiosis”.

# Solution goes here


### Multisets

A Counter is a natural representation of a multiset, which is a set where the elements can appear more than once. You could use multisets for a game like Scrabble to see if a given set of tiles can be used to spell a given word.

Exercise: Write a definition for a class called Multiset that inherits from Counter and defines an additional method called is_subset, which should take self and other as parameters, where other is another Multiset.

It should check whether self is a subset of other; for multisets, that means that every element of self appears in other with at least the same frequency. For example, aa is a subset of aaab, but aabb is not.

# Solution goes here


The following function uses Multiset.is_subset to check whether a particular word can be spelled using a particular set of tiles.

def can_spell(word, tiles):
"""Checks whether a set of tiles can spell a word.

word: string
tiles: string

returns: boolean
"""
return Multiset(word).is_subset(Multiset(tiles))

can_spell('SYZYGY', 'AGSYYYZ')

can_spell('omelette', 'breaking a few eggs')


Optional Exercise: If you change the name of is_subset to __le__, you can use the <= operator to test whether one Multiset is a subset of another.

### Probability Mass Functions

You can also extend Counter to represent a probability mass function (PMF). A PMF is a map from possible outcomes to their probabilities. The probabilities in a PMF are “normalized” if they add up to 1 (and they are all non-negative).

The following PMF class inherits from Counter and adds the following methods:

• normalize computes the total of the frequencies and divides through, yielding probabilities that add to 1.

• __add__ enumerates all pairs of outcomes and returns a new Pmf that represents the distribution of the sum.

• render returns the outcomes and probabilities in a form ready for plotting.

class Pmf(Counter):
"""A Counter with probabilities."""

def normalize(self):
"""Normalizes the PMF so the probabilities add to 1."""
total = sum(self.values())
for key in self:
self[key] /= total

The result is the distribution of sums of outcomes from the
two distributions.

Note that this method is only correct if the selections from
the two distributions are independent; that is, if the outcome
of the first selection does not affect the probabilities of
the outcomes for the second selection.

other: Pmf

returns: new Pmf
"""
pmf = Pmf()
for key1, prob1 in self.items():
for key2, prob2 in other.items():
pmf[key1 + key2] += prob1 * prob2
return pmf

def render(self):
"""Returns outcomes and their probabilities, suitable for plotting."""
return zip(*sorted(self.items()))


As an example, we can make a Pmf object that represents a 6-sided die.

d6 = Pmf([1,2,3,4,5,6])
d6.normalize()
d6.name = 'one die'
print(d6)


Using the add operator, we can compute the distribution for the sum of two dice.

d6_twice = d6 + d6
d6_twice.name = 'two dice'

for key, prob in d6_twice.items():
print(key, prob)


Using sum or np.sum, we can compute the distribution for the sum of three dice.

# if we use the built-in sum we have to provide a Pmf additive identity
pmf_ident = Pmf([0])
d6_thrice = sum([d6]*3, pmf_ident)

# with np.sum, we don't need an identity
d6_thrice = np.sum([d6]*3)
d6_thrice.name = 'three dice'


And then plot the results (using Pmf.render)

import matplotlib.pyplot as plt
%matplotlib inline

for die in [d6, d6_twice, d6_thrice]:
xs, ys = die.render()
plt.plot(xs, ys, label=die.name, linewidth=3, alpha=0.5)

plt.xlabel('Total')
plt.ylabel('Probability')
plt.legend()
plt.show()


Exercise: Suppose you are fighting an orc who will die if he suffers 9 or more hit points of damage. You attack successfully with short sword and dagger, so you can roll a d6 and a d4 for total damage. What is the probability that you kill the orc?

# Solution goes here

# Solution goes here


### Bayesian statistics

A Suite is a Pmf that represents a set of hypotheses and their probabilities; it provides bayesian_update, which updates the probability of the hypotheses based on new data.

Suite is an abstract parent class; child classes should provide a likelihood method that evaluates the likelihood of the data under a given hypothesis. update_bayesian loops through the hypotheses, evaluates the likelihood of the data under each hypothesis, and updates the probabilities accordingly. Then it re-normalizes the PMF.

class Suite(Pmf):
"""Map from hypothesis to probability."""

def bayesian_update(self, data):
"""Performs a Bayesian update.

Note: called bayesian_update to avoid overriding dict.update

data: result of a die roll
"""
for hypo in self:
like = self.likelihood(data, hypo)
self[hypo] *= like

self.normalize()

def print_probs(self):
for hypo in sorted(self):
print(hypo, self[hypo])


As an example, I’ll use Suite to solve the “Dice Problem,” from Chapter 3 of Think Bayes.

“Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. Suppose I select a die from the box at random, roll it, and get a 6. What is the probability that I rolled each die?”

I’ll start by defining DiceSuite, which inherits bayesian_update from Suite and provides likelihood.

data is the observed die roll, 6 in the example.

hypo is the hypothetical number of sides on the die.

If data > hypo, that means the outcome exceeds the number of sides on the die; that’s impossible, so it has probability 0.

Otherwise, the probability of any outcome is 1/hypo, where hypo is the number of sides on the die.

class DiceSuite(Suite):

def likelihood(self, data, hypo):
"""Computes the likelihood of the data under the hypothesis.

data: result of a die roll
hypo: integer number of sides on the die
"""
if data > hypo:
return 0
else:
return 1/hypo


Now we can make a DiceSuite object that represents the possible number of sides on the die. By default, all dice have the same prior probability.

Then I update the distribution with the given outcome and print the results:

dice_suite = DiceSuite([4, 6, 8, 12, 20])
dice_suite.normalize()

dice_suite.print_probs()

dice_suite.bayesian_update(6)

dice_suite.print_probs()


As expected, the 4-sided die has been eliminated; it now has 0 probability. The 6-sided die is the most likely, but the 8-sided die is still quite possible.

Now suppose I roll the die again and get an 8. We can update the Suite again with the new data.

dice_suite.bayesian_update(8)

dice_suite.print_probs()


Now the 6-sided die has been eliminated, the 8-sided die is most likely, and there is less than a 10% chance that I am rolling a 20-sided die.

Exercise: Draw a UML class diagram that shows the relationships among all classes in this notebook, plus Counter and dict.

Exercise: Suppose you know that up to 100 sequentially-numbered raffle tickets have been sold, and you think it is equally likely that the number sold is anywhere from 1 to 100. You find a randomly discarded ticket that is number 37. What is the probability that it is the winning ticket?

Hint: Define a class named TicketSuite that inherits from Suite and provides an appropriate likelihood function. It should also define a method that computes the average probability of winning, weighted by the probability of each outcome.

Note: This exercise is pretty challenging if you are not familiar with Bayesian statistics. But if it intrigues you, consider taking Computational Bayesian Statistics.

# Solution goes here

# Solution goes here

# Solution goes here