Tag Archive | priors

PYMC potentials and priors


When I started learning Bayesian statistics I found very useful PYMC, as I needed to play with examples without having to implement MCMC myself or going through complicated integrals. A good source of information is Bayesian Methods for Hakers. But the outstanding blog by Jake VanderPlas has proven extremely enlightening, both in coding examples and underlying fundamental concepts.

PYMC implements potentials, but there are few examples of their uses. In this post I will show how priors can be implemented as potentials using two previously published examples. I hope this will show potentials, just as priors and likelihood, are just one more term in the posterior distribution that MCMC samples.

 

Linear fit with non-uniform priors

Here I reproduce the linear fit from Jake Vanderplas post but using just two points:

xdata = np.array([0, 1.])
ydata = np.array([0, 2.])
edata = 1.* np.ones_like(xdata)

We will also use the same visualization functions defined in his post:

import pymc

# Create some convenience routines for plotting

def compute_sigma_level(trace1, trace2, nbins=20):
    """From a set of traces, bin by number of standard deviations"""
    L, xbins, ybins = np.histogram2d(trace1, trace2, nbins)
    L[L == 0] = 1E-16
    logL = np.log(L)

    shape = L.shape
    L = L.ravel()

    # obtain the indices to sort and unsort the flattened array
    i_sort = np.argsort(L)[::-1]
    i_unsort = np.argsort(i_sort)

    L_cumsum = L[i_sort].cumsum()
    L_cumsum /= L_cumsum[-1]
    
    xbins = 0.5 * (xbins[1:] + xbins[:-1])
    ybins = 0.5 * (ybins[1:] + ybins[:-1])

    return xbins, ybins, L_cumsum[i_unsort].reshape(shape)


def plot_MCMC_trace(ax, xdata, ydata, trace, scatter=False, **kwargs):
    """Plot traces and contours"""
    xbins, ybins, sigma = compute_sigma_level(trace[0], trace[1])
    ax.contour(xbins, ybins, sigma.T, levels=[0.683, 0.955], **kwargs)
    if scatter:
        ax.plot(trace[0], trace[1], ',k', alpha=0.1)
    ax.set_xlabel(r'$\alpha$')
    ax.set_ylabel(r'$\beta$')
    
    
def plot_MCMC_model(ax, xdata, ydata, trace):
    """Plot the linear model and 2sigma contours"""
    ax.plot(xdata, ydata, 'ok')

    alpha, beta = trace[:2]
    xfit = np.linspace(-1, 2, 10) #This is the only change from Jake's post!
    yfit = alpha[:, None] + beta[:, None] * xfit
    mu = yfit.mean(0)
    sig = 2 * yfit.std(0)

    ax.plot(xfit, mu, '-k')
    ax.fill_between(xfit, mu - sig, mu + sig, color='lightgray')

    ax.set_xlabel('x')
    ax.set_ylabel('y')


def plot_MCMC_results(xdata, ydata, trace, colors='k'):
    """Plot both the trace and the model together"""
    fig, ax = plt.subplots(1, 2, figsize=(10, 4))
    plot_MCMC_trace(ax[0], xdata, ydata, trace, True, colors=colors)
    plot_MCMC_model(ax[1], xdata, ydata, trace)

First we will use uniform priors. With uniform priors one has to make sure their range is large enough to include all values of the parameters. Otherwise the sampled parameters cluster around the boundary of the uniform range. You can try this by changing the range for beta to (-1, 1).

# Define the variables needed for the routine, with their prior distributions
alpha = pymc.Uniform('alpha', -10, 10)
beta = pymc.Uniform('beta', -10, 10)

# Define the form of the model and likelihood
@pymc.deterministic
def y_model(x=xdata, alpha=alpha, beta=beta):
    return alpha + beta * x
# (We define the error as 2 std)
y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata)

# package the full model in a dictionary
model1 = dict(alpha=alpha, beta=beta, y_model=y_model, y=y)

# run the basic MCMC:
S = pymc.MCMC(model1)
S.sample(iter=100000, burn=50000)

# extract the traces and plot the results
pymc_trace_unifo = [S.trace('alpha')[:],
              S.trace('beta')[:]]

plot_MCMC_results(xdata, ydata, pymc_trace_unifo)
print()
print("alpha mean= {:.2f}".format(pymc_trace_unifo[0].mean()))
print("beta  mean= {:.2f}".format(pymc_trace_unifo[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 16.8 sec
alpha mean= 0.00
beta mean= 1.99
PYMC_potentials_fig2

 

Now with potentials. A uniform prior results in a constant log-p term, so we need to add the lop-p of the prior as a potential with the same effects (the constant term does not affect the MCMC sampling):

# Define the variables needed for the routine, with their prior distributions
alpha = pymc.Uniform('alpha', -10, 10)
beta = pymc.Uniform('beta', -10, 10)

#Define a potential equivalent to a prior
@pymc.potential
def pot_prior(val=beta):
    return -1.5 * np.log(1 + val ** 2)

# Define the form of the model and likelihood
@pymc.deterministic
def y_model(x=xdata, alpha=alpha, beta=beta):
    return alpha + beta * x

y = pymc.Normal('y', mu=y_model, tau=1. / (edata/2) ** 2, observed=True, value=ydata)

# package the full model in a dictionary
model1 = dict(alpha=alpha, beta=beta, y_model=y_model, pot_prior=pot_prior, y=y)

# run the basic MCMC: we'll do 100000 iterations to match emcee above
S = pymc.MCMC(model1)
S.sample(iter=100000, burn=50000)

# extract the traces and plot the results
pymc_trace_pot = [S.trace('alpha')[:],
              S.trace('beta')[:]]

plot_MCMC_results(xdata, ydata, pymc_trace_pot)
print()
print("alpha mean= {:.2f}".format(pymc_trace_prior[0].mean()))
print("beta  mean= {:.2f}".format(pymc_trace_prior[1].mean()))

[—————–100%—————–] 100000 of 100000 complete in 18.6 sec
alpha mean= 0.31
beta mean= 1.38

PYMC_potentials_fig2

I find it surprising that the average fit does not run through the data points. On the other hand this is not unexpected: our prior for beta favours values close to 0. I guess an asymmetrical error model in the likelihood could oppose this trend, but this is really beyond the scope of my knowledge and I would really appreciate comments on that.

Ridge regression

As a second example of potentials, we’ll analyze the post by Austin Rochford, which I also enjoyed reading. It discusses, among others, Bayesian Ridge regression. Ridge regression adds a quadratic potential to the least-squares objective function to minimize. This is done to achieve regularization. A Bayesian approach reveals that this is equivalent to using a normal prior for the slope beta parameters.

I refer again to his post for explanations and more insight. In that post, instead of sampling the posterior he just finds the Maximum a posteriori (MAP) estimate. Here I just show that the same value is obtained when using uniform priors and an external potential is added ad hoc as is done in traditional Ridge regression.

from scipy.stats import norm

n = 1000

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

X = np.column_stack([x1, x2, x3])
y = 10 * x1 + 10 * x2 + 0.1 * x3

def get_coefficients(map_):
    return [{str(variable): variable.value} for variable in map_.variables if str(variable).startswith('beta')]

beta1_pot = pymc.Uniform('beta1', -100.,100.)
beta2_pot = pymc.Uniform('beta2', -100.,100.)
beta3_pot = pymc.Uniform('beta3', -100.,100.)

@pymc.potential
def pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot):
    tau = 1.
    return -tau/2.*(beta1**2+beta2**2+beta3**2)


@pymc.deterministic
def y_hat_pot(beta1=beta1_pot, beta2=beta2_pot, beta3=beta3_pot, x1=x1, x2=x2, x3=x3):
    return beta1 * x1 + beta2 * x2 + beta3 * x3

Y_pot = pymc.Normal('Y', mu=y_hat_pot, tau=1.0, value=y, observed=True)

pot_model = pymc.Model([Y_pot, beta1_pot, beta2_pot, beta3_pot, pot])
pot_map = pymc.MAP(pot_model)
pot_map.fit()

print("With potentials:\n",get_coefficients(pot_map))

#
# With priors. Based on
# http://www.austinrochford.com/posts/2013-09-02-prior-distributions-for-bayesian-regression-using-pymc.html
#

beta1_ridge = pymc.Normal('beta1', mu=0, tau=1.0)
beta2_ridge = pymc.Normal('beta2', mu=0, tau=1.0)
beta3_ridge = pymc.Normal('beta3', mu=0, tau=1.0)

@pymc.deterministic
def y_hat_ridge(beta1=beta1_ridge, beta2=beta2_ridge, beta3=beta3_ridge, x1=x1, x2=x2, x3=x3):
    return beta1 * x1 + beta2 * x2 + beta3 * x3
Y_ridge = pymc.Normal('Y', mu=y_hat_ridge, tau=1.0, value=y, observed=True)

ridge_model = pymc.Model([Y_ridge, beta1_ridge, beta2_ridge, beta3_ridge])
ridge_map = pymc.MAP(ridge_model)
ridge_map.fit()

print("With priors:\n",get_coefficients(ridge_map))

With potentials:
[{‘beta1’: array(0.004691201218377386)}, {‘beta3’: array(0.09966093745608606)}, {‘beta2’: array(0.004958704169379989)}]
With priors:
[{‘beta3’: array(0.09966049909698146)}, {‘beta1’: array(0.004779905863422812)}, {‘beta2’: array(0.005045738743703612)}]

Advertisements