Archive | November 2014

# PYMC potentials and priors

When I started learning Bayesian statistics I found very useful PYMC, as I needed to play with examples without having to implement MCMC myself or going through complicated integrals. A good source of information is Bayesian Methods for Hakers. But the outstanding blog by Jake VanderPlas has proven extremely enlightening, both in coding examples and underlying fundamental concepts.

PYMC implements potentials, but there are few examples of their uses. In this post I will show how priors can be implemented as potentials using two previously published examples. I hope this will show potentials, just as priors and likelihood, are just one more term in the posterior distribution that MCMC samples.

### Linear fit with non-uniform priors

Here I reproduce the linear fit from Jake Vanderplas post but using just two points:

xdata = np.array([0, 1.])
ydata = np.array([0, 2.])
edata = 1.* np.ones_like(xdata)


We will also use the same visualization functions defined in his post:

import pymc

# Create some convenience routines for plotting

def compute_sigma_level(trace1, trace2, nbins=20):
"""From a set of traces, bin by number of standard deviations"""
L, xbins, ybins = np.histogram2d(trace1, trace2, nbins)
L[L == 0] = 1E-16
logL = np.log(L)

shape = L.shape
L = L.ravel()

# obtain the indices to sort and unsort the flattened array
i_sort = np.argsort(L)[::-1]
i_unsort = np.argsort(i_sort)

L_cumsum = L[i_sort].cumsum()
L_cumsum /= L_cumsum[-1]

xbins = 0.5 * (xbins[1:] + xbins[:-1])
ybins = 0.5 * (ybins[1:] + ybins[:-1])

return xbins, ybins, L_cumsum[i_unsort].reshape(shape)

def plot_MCMC_trace(ax, xdata, ydata, trace, scatter=False, **kwargs):
"""Plot traces and contours"""
xbins, ybins, sigma = compute_sigma_level(trace[0], trace[1])
ax.contour(xbins, ybins, sigma.T, levels=[0.683, 0.955], **kwargs)
if scatter:
ax.plot(trace[0], trace[1], ',k', alpha=0.1)
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$\beta$')

def plot_MCMC_model(ax, xdata, ydata, trace):
"""Plot the linear model and 2sigma contours"""
ax.plot(xdata, ydata, 'ok')

alpha, beta = trace[:2]
xfit = np.linspace(-1, 2, 10) #This is the only change from Jake's post!
yfit = alpha[:, None] + beta[:, None] * xfit
mu = yfit.mean(0)
sig = 2 * yfit.std(0)

ax.plot(xfit, mu, '-k')
ax.fill_between(xfit, mu - sig, mu + sig, color='lightgray')

ax.set_xlabel('x')
ax.set_ylabel('y')

def plot_MCMC_results(xdata, ydata, trace, colors='k'):
"""Plot both the trace and the model together"""
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
plot_MCMC_trace(ax[0], xdata, ydata, trace, True, colors=colors)
plot_MCMC_model(ax[1], xdata, ydata, trace)



First we will use uniform priors. With uniform priors one has to make sure their range is large enough to include all values of the parameters. Otherwise the sampled parameters cluster around the boundary of the uniform range. You can try this by changing the range for beta to (-1, 1).

# Define the variables needed for the routine, with their prior distributions
alpha = pymc.Uniform('alpha', -10, 10)
beta = pymc.Uniform('beta', -10, 10)

# Define the form of the model and likelihood
@pymc.deterministic
def y_model(x=xdata, alpha=alpha, beta=beta):
return alpha + beta * x
# (We define the error as 2 std)
y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata)

# package the full model in a dictionary
model1 = dict(alpha=alpha, beta=beta, y_model=y_model, y=y)

# run the basic MCMC:
S = pymc.MCMC(model1)
S.sample(iter=100000, burn=50000)

# extract the traces and plot the results
pymc_trace_unifo = [S.trace('alpha')[:],
S.trace('beta')[:]]

plot_MCMC_results(xdata, ydata, pymc_trace_unifo)
print()
print("alpha mean= {:.2f}".format(pymc_trace_unifo[0].mean()))
print("beta  mean= {:.2f}".format(pymc_trace_unifo[1].mean()))


[—————–100%—————–] 100000 of 100000 complete in 16.8 sec
alpha mean= 0.00
beta mean= 1.99

Now with potentials. A uniform prior results in a constant log-p term, so we need to add the lop-p of the prior as a potential with the same effects (the constant term does not affect the MCMC sampling):

# Define the variables needed for the routine, with their prior distributions
alpha = pymc.Uniform('alpha', -10, 10)
beta = pymc.Uniform('beta', -10, 10)

#Define a potential equivalent to a prior
@pymc.potential
def pot_prior(val=beta):
return -1.5 * np.log(1 + val ** 2)

# Define the form of the model and likelihood
@pymc.deterministic
def y_model(x=xdata, alpha=alpha, beta=beta):
return alpha + beta * x

y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata)

# package the full model in a dictionary
model1 = dict(alpha=alpha, beta=beta, y_model=y_model, pot_prior=pot_prior, y=y)

# run the basic MCMC: we'll do 100000 iterations to match emcee above
S = pymc.MCMC(model1)
S.sample(iter=100000, burn=50000)

# extract the traces and plot the results
pymc_trace_pot = [S.trace('alpha')[:],
S.trace('beta')[:]]

plot_MCMC_results(xdata, ydata, pymc_trace_pot)
print()
print("alpha mean= {:.2f}".format(pymc_trace_prior[0].mean()))
print("beta  mean= {:.2f}".format(pymc_trace_prior[1].mean()))


[—————–100%—————–] 100000 of 100000 complete in 18.6 sec
alpha mean= 0.31
beta mean= 1.38

I find it surprising that the average fit does not run through the data points. On the other hand this is not unexpected: our prior for beta favours values close to 0. I guess an asymmetrical error model in the likelihood could oppose this trend, but this is really beyond the scope of my knowledge and I would really appreciate comments on that.

### Ridge regression

As a second example of potentials, we’ll analyze the post by Austin Rochford, which I also enjoyed reading. It discusses, among others, Bayesian Ridge regression. Ridge regression adds a quadratic potential to the least-squares objective function to minimize. This is done to achieve regularization. A Bayesian approach reveals that this is equivalent to using a normal prior for the slope beta parameters.

I refer again to his post for explanations and more insight. In that post, instead of sampling the posterior he just finds the Maximum a posteriori (MAP) estimate. Here I just show that the same value is obtained when using uniform priors and an external potential is added ad hoc as is done in traditional Ridge regression.

from scipy.stats import norm

n = 1000

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3

def get_coefficients(map_):
return [{str(variable): variable.value} for variable in map_.variables if str(variable).startswith('beta')]

beta1_pot = pymc.Uniform('beta1', -100.,100.)
beta2_pot = pymc.Uniform('beta2', -100.,100.)
beta3_pot = pymc.Uniform('beta3', -100.,100.)

@pymc.potential
def pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot):
tau = 1.
return -tau/2.*(beta1**2+beta2**2+beta3**2)

@pymc.deterministic
def y_hat_pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot, x1=x1, x2=x2, x3=x3):
return beta1 * x1 + beta2 * x2 + beta3 * x3

Y_pot = pymc.Normal('Y', mu=y_hat_pot, tau=1.0, value=y, observed=True)

pot_model = pymc.Model([Y_pot, beta1_pot, beta2_pot, beta3_pot, pot])
pot_map = pymc.MAP(pot_model)
pot_map.fit()

print("With potentials:\n",get_coefficients(pot_map))

#
# With priors. Based on
# http://www.austinrochford.com/posts/2013-09-02-prior-distributions-for-bayesian-regression-using-pymc.html
#

beta1_ridge = pymc.Normal('beta1', mu=0, tau=1.0)
beta2_ridge = pymc.Normal('beta2', mu=0, tau=1.0)
beta3_ridge = pymc.Normal('beta3', mu=0, tau=1.0)

@pymc.deterministic
def y_hat_ridge(beta1=beta1_ridge, beta2=beta2_ridge, beta3=beta3_ridge, x1=x1, x2=x2, x3=x3):
return beta1 * x1 + beta2 * x2 + beta3 * x3
Y_ridge = pymc.Normal('Y', mu=y_hat_ridge, tau=1.0, value=y, observed=True)

ridge_model = pymc.Model([Y_ridge, beta1_ridge, beta2_ridge, beta3_ridge])
ridge_map = pymc.MAP(ridge_model)
ridge_map.fit()

print("With priors:\n",get_coefficients(ridge_map))


With potentials:
[{‘beta1’: array(0.004691201218377386)}, {‘beta3’: array(0.09966093745608606)}, {‘beta2’: array(0.004958704169379989)}]
With priors:
[{‘beta3’: array(0.09966049909698146)}, {‘beta1’: array(0.004779905863422812)}, {‘beta2’: array(0.005045738743703612)}]

# A periodic table of objects

… or a first step into Python objects.

Let us assume that for some large project we need an easy way to
translate atomic symbols into element names. The first thing that comes
to your mind is to use a dictionary. As this seems a bijective relation,
keys can be symbols and values names, or vice-versa. This is an
important decision as it fixes the structure of the dictionary. I
believe keys should be symbols. Let me give you a couple of reasons.
First, there is one element W, that has 2 names, Tungsten and
Wolfram. Second, you might image extending the dictionary to include
names in other languages. Therefore:


elementsDict=dict(H='hydrogen',
He='helium',
Li='lithium' ) # And so on...


Now your project has grown and you would like to also keep the atomic
number of an element. Two simple solutions are creating a second
dictionary (duplicating work is not nice and error prone) or keeping a
tuple (or a list) for the values, such as:


elementsDict=dict(H=('hydrogen', 1),
He=('helium', 2),
Li=('lithium', 3) ) # And so on...


This makes accessing to the attributes of an element ugly, as you have
to remember the arbitrary order of the attribute in the tuple. Did I say
attributes? That sounds object oriented! In fact, it is easy to
visualize an element as something (an object!) with different
attributes such as its name, its atomic symbol, its atomic weight, etc.
These attributes are of different type and do not bear an ordered
relationship among themselves as a list assumes. So let’s make an
object:

    class Element:
def __init__(self, name, atomicNumber):
self.name = name
self.atomicNumber = atomicNumber

elementsDict=dict(H=Element('hydrogen', 1),
He=Element('helium', 2),
Li=Element('lithium', 3) )


Let’s try it:


for symbol in elementsDict:
print("The atomic number of {} is {}".format(elementsDict[symbol].name,
elementsDict[symbol].atomicNumber))


The atomic number of hydrogen is 1
The atomic number of lithium is 3
The atomic number of helium is 2

At this point, you might think it would be useful for the element object
to also contain its symbol. That’s easy, because you can add attributes
to an object dynamically:


for symbol in elementsDict:
elementsDict[symbol].symbol = symbol

print("{} name is {}".format(elementsDict['H'].symbol,
elementsDict['H'].name))


H name is hydrogen

### The other way around

For the sake of completeness, imagine you have created a list of
elements with a class that contains the atomic symbol, and you then want
to create a dictionary from that list. I find this more clear as the
__init__ method then contains all the attributes. Here is how you
can do it:


class Element2:
def __init__(self, symbol, name, atomicNumber):
self.symbol = symbol
self.name = name
self.atomicNumber = atomicNumber

elements2List =[Element2('H', 'hydrogen', 1),
Element2('He', 'helium', 2),
Element2('Li', 'lithium', 3) ]
# Then create a dictionary with a comprehension:
elements2Dict = {element.symbol: element for element in elements2List}


Obviously it works as before:


for symbol in elements2Dict:
print("The atomic number of {} is {}".format(elements2Dict[symbol].name,
elements2Dict[symbol].atomicNumber))


The atomic number of hydrogen is 1
The atomic number of lithium is 3
The atomic number of helium is 2