# PYMC potentials and priors

When I started learning Bayesian statistics I found very useful PYMC, as I needed to play with examples without having to implement MCMC myself or going through complicated integrals. A good source of information is Bayesian Methods for Hakers. But the outstanding blog by Jake VanderPlas has proven extremely enlightening, both in coding examples and underlying fundamental concepts.

PYMC implements potentials, but there are few examples of their uses. In this post I will show how priors can be implemented as potentials using two previously published examples. I hope this will show potentials, just as priors and likelihood, are just one more term in the posterior distribution that MCMC samples.

### Linear fit with non-uniform priors

Here I reproduce the linear fit from Jake Vanderplas post but using just two points:

xdata = np.array([0, 1.]) ydata = np.array([0, 2.]) edata = 1.* np.ones_like(xdata)

We will also use the same visualization functions defined in his post:

import pymc # Create some convenience routines for plotting def compute_sigma_level(trace1, trace2, nbins=20): """From a set of traces, bin by number of standard deviations""" L, xbins, ybins = np.histogram2d(trace1, trace2, nbins) L[L == 0] = 1E-16 logL = np.log(L) shape = L.shape L = L.ravel() # obtain the indices to sort and unsort the flattened array i_sort = np.argsort(L)[::-1] i_unsort = np.argsort(i_sort) L_cumsum = L[i_sort].cumsum() L_cumsum /= L_cumsum[-1] xbins = 0.5 * (xbins[1:] + xbins[:-1]) ybins = 0.5 * (ybins[1:] + ybins[:-1]) return xbins, ybins, L_cumsum[i_unsort].reshape(shape) def plot_MCMC_trace(ax, xdata, ydata, trace, scatter=False, **kwargs): """Plot traces and contours""" xbins, ybins, sigma = compute_sigma_level(trace[0], trace[1]) ax.contour(xbins, ybins, sigma.T, levels=[0.683, 0.955], **kwargs) if scatter: ax.plot(trace[0], trace[1], ',k', alpha=0.1) ax.set_xlabel(r'$\alpha$') ax.set_ylabel(r'$\beta$') def plot_MCMC_model(ax, xdata, ydata, trace): """Plot the linear model and 2sigma contours""" ax.plot(xdata, ydata, 'ok') alpha, beta = trace[:2] xfit = np.linspace(-1, 2, 10) #This is the only change from Jake's post! yfit = alpha[:, None] + beta[:, None] * xfit mu = yfit.mean(0) sig = 2 * yfit.std(0) ax.plot(xfit, mu, '-k') ax.fill_between(xfit, mu - sig, mu + sig, color='lightgray') ax.set_xlabel('x') ax.set_ylabel('y') def plot_MCMC_results(xdata, ydata, trace, colors='k'): """Plot both the trace and the model together""" fig, ax = plt.subplots(1, 2, figsize=(10, 4)) plot_MCMC_trace(ax[0], xdata, ydata, trace, True, colors=colors) plot_MCMC_model(ax[1], xdata, ydata, trace)

First we will use uniform priors. With uniform priors one has to make sure their range is large enough to include all values of the parameters. Otherwise the sampled parameters cluster around the boundary of the uniform range. You can try this by changing the range for `beta`

to `(-1, 1)`

.

# Define the variables needed for the routine, with their prior distributions alpha = pymc.Uniform('alpha', -10, 10) beta = pymc.Uniform('beta', -10, 10) # Define the form of the model and likelihood @pymc.deterministic def y_model(x=xdata, alpha=alpha, beta=beta): return alpha + beta * x # (We define the error as 2 std) y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata) # package the full model in a dictionary model1 = dict(alpha=alpha, beta=beta, y_model=y_model, y=y) # run the basic MCMC: S = pymc.MCMC(model1) S.sample(iter=100000, burn=50000) # extract the traces and plot the results pymc_trace_unifo = [S.trace('alpha')[:], S.trace('beta')[:]] plot_MCMC_results(xdata, ydata, pymc_trace_unifo) print() print("alpha mean= {:.2f}".format(pymc_trace_unifo[0].mean())) print("beta mean= {:.2f}".format(pymc_trace_unifo[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 16.8 sec

alpha mean= 0.00

beta mean= 1.99

Now with potentials. A uniform prior results in a constant log-p term, so we need to add the lop-p of the prior as a potential with the same effects (the constant term does not affect the MCMC sampling):

# Define the variables needed for the routine, with their prior distributions alpha = pymc.Uniform('alpha', -10, 10) beta = pymc.Uniform('beta', -10, 10) #Define a potential equivalent to a prior @pymc.potential def pot_prior(val=beta): return -1.5 * np.log(1 + val ** 2) # Define the form of the model and likelihood @pymc.deterministic def y_model(x=xdata, alpha=alpha, beta=beta): return alpha + beta * x y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata) # package the full model in a dictionary model1 = dict(alpha=alpha, beta=beta, y_model=y_model, pot_prior=pot_prior, y=y) # run the basic MCMC: we'll do 100000 iterations to match emcee above S = pymc.MCMC(model1) S.sample(iter=100000, burn=50000) # extract the traces and plot the results pymc_trace_pot = [S.trace('alpha')[:], S.trace('beta')[:]] plot_MCMC_results(xdata, ydata, pymc_trace_pot) print() print("alpha mean= {:.2f}".format(pymc_trace_prior[0].mean())) print("beta mean= {:.2f}".format(pymc_trace_prior[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 18.6 sec

alpha mean= 0.31

beta mean= 1.38

I find it surprising that the average fit does not run through the data points. On the other hand this is not unexpected: our prior for `beta`

favours values close to 0. I guess an asymmetrical error model in the likelihood could oppose this trend, but this is really beyond the scope of my knowledge and I would really appreciate comments on that.

### Ridge regression

As a second example of potentials, we’ll analyze the post by Austin Rochford, which I also enjoyed reading. It discusses, among others, Bayesian Ridge regression. Ridge regression adds a quadratic potential to the least-squares objective function to minimize. This is done to achieve regularization. A Bayesian approach reveals that this is equivalent to using a normal prior for the slope `beta`

parameters.

I refer again to his post for explanations and more insight. In that post, instead of sampling the posterior he just finds the *Maximum a posteriori* (MAP) estimate. Here I just show that the same value is obtained when using uniform priors and an external potential is added *ad hoc* as is done in traditional Ridge regression.

from scipy.stats import norm n = 1000 x1 = norm.rvs(0, 1, size=n) x2 = -x1 + norm.rvs(0, 10**-3, size=n) x3 = norm.rvs(0, 1, size=n) X = np.column_stack([x1, x2, x3]) y = 10 * x1 + 10 * x2 + 0.1 * x3 def get_coefficients(map_): return [{str(variable): variable.value} for variable in map_.variables if str(variable).startswith('beta')] beta1_pot = pymc.Uniform('beta1', -100.,100.) beta2_pot = pymc.Uniform('beta2', -100.,100.) beta3_pot = pymc.Uniform('beta3', -100.,100.) @pymc.potential def pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot): tau = 1. return -tau/2.*(beta1**2+beta2**2+beta3**2) @pymc.deterministic def y_hat_pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot, x1=x1, x2=x2, x3=x3): return beta1 * x1 + beta2 * x2 + beta3 * x3 Y_pot = pymc.Normal('Y', mu=y_hat_pot, tau=1.0, value=y, observed=True) pot_model = pymc.Model([Y_pot, beta1_pot, beta2_pot, beta3_pot, pot]) pot_map = pymc.MAP(pot_model) pot_map.fit() print("With potentials:\n",get_coefficients(pot_map)) # # With priors. Based on # http://www.austinrochford.com/posts/2013-09-02-prior-distributions-for-bayesian-regression-using-pymc.html # beta1_ridge = pymc.Normal('beta1', mu=0, tau=1.0) beta2_ridge = pymc.Normal('beta2', mu=0, tau=1.0) beta3_ridge = pymc.Normal('beta3', mu=0, tau=1.0) @pymc.deterministic def y_hat_ridge(beta1=beta1_ridge, beta2=beta2_ridge, beta3=beta3_ridge, x1=x1, x2=x2, x3=x3): return beta1 * x1 + beta2 * x2 + beta3 * x3 Y_ridge = pymc.Normal('Y', mu=y_hat_ridge, tau=1.0, value=y, observed=True) ridge_model = pymc.Model([Y_ridge, beta1_ridge, beta2_ridge, beta3_ridge]) ridge_map = pymc.MAP(ridge_model) ridge_map.fit() print("With priors:\n",get_coefficients(ridge_map))

With potentials:

[{‘beta1’: array(0.004691201218377386)}, {‘beta3’: array(0.09966093745608606)}, {‘beta2’: array(0.004958704169379989)}]

With priors:

[{‘beta3’: array(0.09966049909698146)}, {‘beta1’: array(0.004779905863422812)}, {‘beta2’: array(0.005045738743703612)}]

## Recent Comments