# PYMC potentials and priors

When I started learning Bayesian statistics I found very useful PYMC, as I needed to play with examples without having to implement MCMC myself or going through complicated integrals. A good source of information is Bayesian Methods for Hakers. But the outstanding blog by Jake VanderPlas has proven extremely enlightening, both in coding examples and underlying fundamental concepts.

PYMC implements potentials, but there are few examples of their uses. In this post I will show how priors can be implemented as potentials using two previously published examples. I hope this will show potentials, just as priors and likelihood, are just one more term in the posterior distribution that MCMC samples.

### Linear fit with non-uniform priors

Here I reproduce the linear fit from Jake Vanderplas post but using just two points:

xdata = np.array([0, 1.]) ydata = np.array([0, 2.]) edata = 1.* np.ones_like(xdata)

We will also use the same visualization functions defined in his post:

import pymc # Create some convenience routines for plotting def compute_sigma_level(trace1, trace2, nbins=20): """From a set of traces, bin by number of standard deviations""" L, xbins, ybins = np.histogram2d(trace1, trace2, nbins) L[L == 0] = 1E-16 logL = np.log(L) shape = L.shape L = L.ravel() # obtain the indices to sort and unsort the flattened array i_sort = np.argsort(L)[::-1] i_unsort = np.argsort(i_sort) L_cumsum = L[i_sort].cumsum() L_cumsum /= L_cumsum[-1] xbins = 0.5 * (xbins[1:] + xbins[:-1]) ybins = 0.5 * (ybins[1:] + ybins[:-1]) return xbins, ybins, L_cumsum[i_unsort].reshape(shape) def plot_MCMC_trace(ax, xdata, ydata, trace, scatter=False, **kwargs): """Plot traces and contours""" xbins, ybins, sigma = compute_sigma_level(trace[0], trace[1]) ax.contour(xbins, ybins, sigma.T, levels=[0.683, 0.955], **kwargs) if scatter: ax.plot(trace[0], trace[1], ',k', alpha=0.1) ax.set_xlabel(r'$\alpha$') ax.set_ylabel(r'$\beta$') def plot_MCMC_model(ax, xdata, ydata, trace): """Plot the linear model and 2sigma contours""" ax.plot(xdata, ydata, 'ok') alpha, beta = trace[:2] xfit = np.linspace(-1, 2, 10) #This is the only change from Jake's post! yfit = alpha[:, None] + beta[:, None] * xfit mu = yfit.mean(0) sig = 2 * yfit.std(0) ax.plot(xfit, mu, '-k') ax.fill_between(xfit, mu - sig, mu + sig, color='lightgray') ax.set_xlabel('x') ax.set_ylabel('y') def plot_MCMC_results(xdata, ydata, trace, colors='k'): """Plot both the trace and the model together""" fig, ax = plt.subplots(1, 2, figsize=(10, 4)) plot_MCMC_trace(ax[0], xdata, ydata, trace, True, colors=colors) plot_MCMC_model(ax[1], xdata, ydata, trace)

First we will use uniform priors. With uniform priors one has to make sure their range is large enough to include all values of the parameters. Otherwise the sampled parameters cluster around the boundary of the uniform range. You can try this by changing the range for `beta`

to `(-1, 1)`

.

# Define the variables needed for the routine, with their prior distributions alpha = pymc.Uniform('alpha', -10, 10) beta = pymc.Uniform('beta', -10, 10) # Define the form of the model and likelihood @pymc.deterministic def y_model(x=xdata, alpha=alpha, beta=beta): return alpha + beta * x # (We define the error as 2 std) y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata) # package the full model in a dictionary model1 = dict(alpha=alpha, beta=beta, y_model=y_model, y=y) # run the basic MCMC: S = pymc.MCMC(model1) S.sample(iter=100000, burn=50000) # extract the traces and plot the results pymc_trace_unifo = [S.trace('alpha')[:], S.trace('beta')[:]] plot_MCMC_results(xdata, ydata, pymc_trace_unifo) print() print("alpha mean= {:.2f}".format(pymc_trace_unifo[0].mean())) print("beta mean= {:.2f}".format(pymc_trace_unifo[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 16.8 sec

alpha mean= 0.00

beta mean= 1.99

Now with potentials. A uniform prior results in a constant log-p term, so we need to add the lop-p of the prior as a potential with the same effects (the constant term does not affect the MCMC sampling):

# Define the variables needed for the routine, with their prior distributions alpha = pymc.Uniform('alpha', -10, 10) beta = pymc.Uniform('beta', -10, 10) #Define a potential equivalent to a prior @pymc.potential def pot_prior(val=beta): return -1.5 * np.log(1 + val ** 2) # Define the form of the model and likelihood @pymc.deterministic def y_model(x=xdata, alpha=alpha, beta=beta): return alpha + beta * x y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata) # package the full model in a dictionary model1 = dict(alpha=alpha, beta=beta, y_model=y_model, pot_prior=pot_prior, y=y) # run the basic MCMC: we'll do 100000 iterations to match emcee above S = pymc.MCMC(model1) S.sample(iter=100000, burn=50000) # extract the traces and plot the results pymc_trace_pot = [S.trace('alpha')[:], S.trace('beta')[:]] plot_MCMC_results(xdata, ydata, pymc_trace_pot) print() print("alpha mean= {:.2f}".format(pymc_trace_prior[0].mean())) print("beta mean= {:.2f}".format(pymc_trace_prior[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 18.6 sec

alpha mean= 0.31

beta mean= 1.38

I find it surprising that the average fit does not run through the data points. On the other hand this is not unexpected: our prior for `beta`

favours values close to 0. I guess an asymmetrical error model in the likelihood could oppose this trend, but this is really beyond the scope of my knowledge and I would really appreciate comments on that.

### Ridge regression

As a second example of potentials, we’ll analyze the post by Austin Rochford, which I also enjoyed reading. It discusses, among others, Bayesian Ridge regression. Ridge regression adds a quadratic potential to the least-squares objective function to minimize. This is done to achieve regularization. A Bayesian approach reveals that this is equivalent to using a normal prior for the slope `beta`

parameters.

I refer again to his post for explanations and more insight. In that post, instead of sampling the posterior he just finds the *Maximum a posteriori* (MAP) estimate. Here I just show that the same value is obtained when using uniform priors and an external potential is added *ad hoc* as is done in traditional Ridge regression.

from scipy.stats import norm n = 1000 x1 = norm.rvs(0, 1, size=n) x2 = -x1 + norm.rvs(0, 10**-3, size=n) x3 = norm.rvs(0, 1, size=n) X = np.column_stack([x1, x2, x3]) y = 10 * x1 + 10 * x2 + 0.1 * x3 def get_coefficients(map_): return [{str(variable): variable.value} for variable in map_.variables if str(variable).startswith('beta')] beta1_pot = pymc.Uniform('beta1', -100.,100.) beta2_pot = pymc.Uniform('beta2', -100.,100.) beta3_pot = pymc.Uniform('beta3', -100.,100.) @pymc.potential def pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot): tau = 1. return -tau/2.*(beta1**2+beta2**2+beta3**2) @pymc.deterministic def y_hat_pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot, x1=x1, x2=x2, x3=x3): return beta1 * x1 + beta2 * x2 + beta3 * x3 Y_pot = pymc.Normal('Y', mu=y_hat_pot, tau=1.0, value=y, observed=True) pot_model = pymc.Model([Y_pot, beta1_pot, beta2_pot, beta3_pot, pot]) pot_map = pymc.MAP(pot_model) pot_map.fit() print("With potentials:\n",get_coefficients(pot_map)) # # With priors. Based on # http://www.austinrochford.com/posts/2013-09-02-prior-distributions-for-bayesian-regression-using-pymc.html # beta1_ridge = pymc.Normal('beta1', mu=0, tau=1.0) beta2_ridge = pymc.Normal('beta2', mu=0, tau=1.0) beta3_ridge = pymc.Normal('beta3', mu=0, tau=1.0) @pymc.deterministic def y_hat_ridge(beta1=beta1_ridge, beta2=beta2_ridge, beta3=beta3_ridge, x1=x1, x2=x2, x3=x3): return beta1 * x1 + beta2 * x2 + beta3 * x3 Y_ridge = pymc.Normal('Y', mu=y_hat_ridge, tau=1.0, value=y, observed=True) ridge_model = pymc.Model([Y_ridge, beta1_ridge, beta2_ridge, beta3_ridge]) ridge_map = pymc.MAP(ridge_model) ridge_map.fit() print("With priors:\n",get_coefficients(ridge_map))

With potentials:

[{‘beta1’: array(0.004691201218377386)}, {‘beta3’: array(0.09966093745608606)}, {‘beta2’: array(0.004958704169379989)}]

With priors:

[{‘beta3’: array(0.09966049909698146)}, {‘beta1’: array(0.004779905863422812)}, {‘beta2’: array(0.005045738743703612)}]

# A periodic table of objects

… or a first step into Python objects.

Let us assume that for some large project we need an easy way to

translate atomic symbols into element names. The first thing that comes

to your mind is to use a dictionary. As this seems a bijective relation,

keys can be symbols and values names, or vice-versa. This is an

important decision as it fixes the structure of the dictionary. I

believe keys should be symbols. Let me give you a couple of reasons.

First, there is one element W, that has 2 names, *Tungsten* and

*Wolfram*. Second, you might image extending the dictionary to include

names in other languages. Therefore:

elementsDict=dict(H='hydrogen', He='helium', Li='lithium' ) # And so on...

Now your project has grown and you would like to also keep the atomic

number of an element. Two simple solutions are creating a second

dictionary (duplicating work is not nice and error prone) or keeping a

tuple (or a list) for the values, such as:

elementsDict=dict(H=('hydrogen', 1), He=('helium', 2), Li=('lithium', 3) ) # And so on...

This makes accessing to the attributes of an element ugly, as you have

to remember the arbitrary order of the attribute in the tuple. Did I say

*attributes*? That sounds object oriented! In fact, it is easy to

visualize an element as something (an *object*!) with different

attributes such as its name, its atomic symbol, its atomic weight, etc.

These attributes are of different type and do not bear an ordered

relationship among themselves as a list assumes. So let’s make an

object:

class Element: def __init__(self, name, atomicNumber): self.name = name self.atomicNumber = atomicNumber elementsDict=dict(H=Element('hydrogen', 1), He=Element('helium', 2), Li=Element('lithium', 3) )

Let’s try it:

for symbol in elementsDict: print("The atomic number of {} is {}".format(elementsDict[symbol].name, elementsDict[symbol].atomicNumber))

The atomic number of hydrogen is 1

The atomic number of lithium is 3

The atomic number of helium is 2

At this point, you might think it would be useful for the element object

to also contain its symbol. That’s easy, because you can add attributes

to an object dynamically:

for symbol in elementsDict: elementsDict[symbol].symbol = symbol print("{} name is {}".format(elementsDict['H'].symbol, elementsDict['H'].name))

H name is hydrogen

### The other way around

For the sake of completeness, imagine you have created a list of

elements with a class that contains the atomic symbol, and you then want

to create a dictionary from that list. I find this more clear as the

`__init__`

method then contains all the attributes. Here is how you

can do it:

class Element2: def __init__(self, symbol, name, atomicNumber): self.symbol = symbol self.name = name self.atomicNumber = atomicNumber elements2List =[Element2('H', 'hydrogen', 1), Element2('He', 'helium', 2), Element2('Li', 'lithium', 3) ] # Then create a dictionary with a comprehension: elements2Dict = {element.symbol: element for element in elements2List}

Obviously it works as before:

for symbol in elements2Dict: print("The atomic number of {} is {}".format(elements2Dict[symbol].name, elements2Dict[symbol].atomicNumber))

The atomic number of hydrogen is 1

The atomic number of lithium is 3

The atomic number of helium is 2

## Recent Comments