Markov Chain Monte Carlo (MCMC) and density estimation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this notebook, we'll get some more experience with MCMC, using both the Metropolis-Hastings routine from the last notebook, and also with the popular emcee package. You'll do some more investigation of the results, and will touch on some basic kernel density estimation routines to look at them.\n",
"\n",
"Many of the cells in this notebook are largely filled in, but you should look through them line-by-line to make sure you understand what they are doing.\n",
"\n",
"You will need to have the emcee and corner packages installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import emcee"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start by simulating data from a straight line with uncertainties (as we've done before!)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"true = # [slope,intercept]\n",
"x = np.arange(0,20)\n",
"yerr= # uncertainties\n",
"y = true[0]*x + true[1] + np.random.normal(0,yerr,size=len(x))\n",
"plt.plot(x,y,'ro')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the Bayesian analysis, define log_likelihood, log_prior, and log_posterior routines. We'll supply the parameters as an array, and then the input data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def log_likelihood(par,x,y,yerr) :\n",
" \"\"\" Likelihood for Gaussian uncertainties\n",
" \"\"\"\n",
" m,b=par \n",
" f = m*x + b\n",
" return -0.5 * np.sum(((y-f) / yerr) ** 2)\n",
"\n",
"def log_prior(par):\n",
" \"\"\" Flat prior\n",
" \"\"\"\n",
" m,b=par\n",
" if not (-10 < m < 10):\n",
" return -np.inf\n",
" if not (-10 < b < 10):\n",
" return -np.inf\n",
" return 0.0\n",
"\n",
"def log_posterior(par,x,y,yerr):\n",
" \"\"\" log posterior\n",
" \"\"\"\n",
" return log_prior(par) + log_likelihood(par,x,y,yerr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we'll supply the MCMC Metropolis routine from last time, but I've formulated it into a function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def metropolis_step(x, y, yerr, log_posterior, theta_t, lnpost_t, step_cov):\n",
" \"\"\" Take single metropolis step and return new chain value\n",
" \"\"\"\n",
" # draw from proposal and calculate new posterior\n",
" q = np.random.multivariate_normal(theta_t, step_cov)\n",
" lp1 = log_posterior(q,x,y,yerr)\n",
" \n",
" # if posterior has value than previous step or falls within second random draw, accept\n",
" if (lp1 - lnpost_t) > np.log(np.random.rand()):\n",
" return q, lp1\n",
" \n",
" # otherwise, return old value\n",
" return theta_t, lnpost_t\n",
"\n",
"def metropolis(x, y, yerr, log_posterior, p0, step = [1.e-4,1.e-4], nstep=50000, nburn=0 ) :\n",
" \"\"\" General Metropolis MCMC routine from Foreman-Mackey notebook\n",
" Input :\n",
" x,y : independent and dependent variables\n",
" log_posterior : function to calculate log(posterior) given parameters\n",
" p0 : array of initial parameter, of length npar\n",
" step : covariance array, (npar,npar)\n",
" nstep : number of steps to take\n",
" nburn : number of initial steps to discard\n",
" \"\"\"\n",
" \n",
" lp0 = log_posterior(p0,x,y,yerr)\n",
" chain = np.empty((nstep, len(p0)))\n",
" for i in range(len(chain)):\n",
" p0, lp0 = metropolis_step(x,y,yerr,log_posterior, p0, lp0, step)\n",
" chain[i] = p0\n",
" \n",
" # Compute the acceptance fraction.\n",
" acc = float(np.any(np.diff(chain, axis=0), axis=1).sum()) / (len(chain)-1)\n",
" print(\"The acceptance fraction was: {0:.3f}\".format(acc))\n",
"\n",
" return chain\n",
"\n",
"def plotchain(chain,labels=None, nburn=0) :\n",
" \"\"\" Routine for plotting the MCMC chains\n",
" \"\"\"\n",
" npts,ndim = chain.shape\n",
" fig,ax = plt.subplots(ndim,1, figsize=(8,5))\n",
" for idim in range(ndim) :\n",
" ax[idim].plot(chain[:,idim],'k')\n",
" if labels != None :\n",
" ax[idim].set_ylabel(labels[idim])\n",
" if nburn> 0 :\n",
" ax[idim].axvline(nburn,color=\"g\",lw=2)\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, now supply initial guess, number of steps, number of burning to discard, step size. Iterate on these until you get a reasonable looking chain and acceptance fraction: you should get acceptance fraction of 0.3 - 0.7 and chains should not show significant autocorrelation (no long coherence)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p0 = # initial guess\n",
"nburn = # number of burning to discard\n",
"nstep = # number of steps\n",
"step= np.diag() # add in step sizes in each parameter\n",
"chain = metropolis(x,y,yerr,log_posterior,p0=p0,nburn=nburn,nstep=nstep,step=step)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plotchain(chain,labels=['m','b'],nburn=nburn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show results in corner plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import corner\n",
"print(chain.shape)\n",
"plot=corner.corner(chain[nburn:, :], labels=[\"m\", \"b\"], show_titles=True, quantiles=[0.16,0.4,0.84])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we're going to use the MCMC Hammer emcee() routine. This routine uses multiple \"walkers\" to investigate the posterior. The recommendation is for at least twice as many walkers as parameters you are trying to determine.\n",
" \n",
"emcee() is written using an object-oriented style: you instantiate a sampler, then use methods of that object to run the MCMC and to look at the results.\n",
" \n",
"For emcee, you have to supply a log_posterior routine with the parameter array as the first argument. If the posterior routine needs additional data to compute (e.g., the data points!), then you supply these using the\n",
"args=[] argument when you instatiate a sampler."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import emcee\n",
"\n",
"# for emcee, we just need to set the number of walkers. Recommendation is at least twice\n",
"# the number of parameters\n",
"nwalkers=4\n",
"ndim=2\n",
"nstep=10000\n",
"nburn=100\n",
"sampler = emcee.EnsembleSampler(nwalkers,ndim,log_posterior,args=[x,y,yerr])\n",
"\n",
"# starting location for each walker: this is a 2D array, with dimensions (nwalker,npar)\n",
"p0=np.array([np.random.uniform(0,5.,size=nwalkers),\n",
" np.random.uniform(0,2.,size=nwalkers)]).T\n",
"print('Initial parameters: ',p0)\n",
"\n",
"s=sampler.run_mcmc(p0,nstep)\n",
"\n",
"# print acceptance fraction which are stored in acceptance_fraction attribute and computed\n",
"# with get_autocorr_time() method\n",
"print('acceptance fraction: {:8.3f}'.format(sampler.acceptance_fraction[0]))\n",
"print('mean autocorrelation time: {:8.3f}'.format(np.mean(sampler.get_autocorr_time())))\n",
"\n",
"# the chains are saved in the chain attribute, with dimension (nwalkers, nstep, ndim)\n",
"print(sampler.chain.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is the acceptance fraction telling you? What is the autocorrelation time/length telling you?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the autocorrelation length to discard burn-in (set to several times autocorrelation length), and to thin (set to roughly half the autocorrelation length). Experiment with different values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig,ax=plt.subplots(ndim,2)\n",
"\n",
"# The flatchain attribute puts the chains from all of the walkers together\n",
"print(sampler.flatchain.shape)\n",
"\n",
"# the get_chain() method will discard initial points and \"thin\" the chain\n",
"nburn= # set nburn\n",
"thin= # set thinning\n",
"flat_samples = sampler.get_chain(discard=nburn, thin=thin, flat=True)\n",
"\n",
"# show the raw chains on the left, and the get_chain() results on the right\n",
"for idim in range(ndim) :\n",
" ax[idim,0].plot(sampler.flatchain[:,idim])\n",
" ax[idim,1].plot(flat_samples[:,idim])\n",
" \n",
"# corner plot\n",
"plot=corner.corner(flat_samples,plot_contours=True,\n",
" labels=['mean','variance'],show_titles=True)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Allow for uncertainties on data points"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's add a dimension and determine the scatter as well as the slope and intercept. Again, we'll do this both with Metropolis-Hastings and emcee."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def log_likelihood(par,x,y,yerr) :\n",
" \"\"\" log(likelihood)\n",
" \"\"\"\n",
" m,b,sig=par \n",
" f = m*x + b\n",
" # need to remember to include normalizing sigma term if that is a variable!\n",
" if sig <= 0:\n",
" # check for negative sig\n",
" return -np.inf\n",
" else:\n",
" return -0.5 * np.sum(((y-f) / sig)**2) - len(x)*np.log(sig)\n",
"\n",
"def log_prior(par):\n",
" \"\"\" flat prior\n",
" \"\"\"\n",
" m,b,sig=par\n",
" if not (-10 < m < 10):\n",
" return -np.inf\n",
" if not (-10 < b < 10):\n",
" return -np.inf\n",
" if sig <= 0 :\n",
" return -np.inf\n",
" return 0.0\n",
"\n",
"def log_posterior(par,x,y,yerr):\n",
" return log_prior(par) + log_likelihood(par,x,y,yerr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set initial parameters, etc. and tune to get a good chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"p0 = # initial values\n",
"nburn = # nburn\n",
"nstep= # nsteps\n",
"step= np.diag( ... ) # step sizes\n",
"chain = metropolis(x,y,yerr,log_posterior,p0=p0,nburn=nburn,nstep=nstep,step=step)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plotchain(chain,labels=['m','b','sig'])\n",
"plot=corner.corner(chain[nburn:, :], labels=[\"m\", \"b\", \"sig\"], show_titles=True, quantiles=[0.16,0.4,0.84])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now do this case using emcee()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# for emcee, we just need to set the number of walkers. Recommendation is at least twice\n",
"# the number of parameters\n",
"ndim= # how many parameters\n",
"nwalkers= # how many walkers\n",
"nstep= # how many steps\n",
"\n",
"# instantiate the sampler\n",
"sampler = emcee.EnsembleSampler(nwalkers,ndim,log_posterior,args=[x,y,yerr])\n",
"\n",
"# starting location for each walker: this is a 2D array, with dimensions (nwalker,npar)\n",
"p0= # initial guesses\n",
"print('Initial parameters: ',p0)\n",
"\n",
"# run the sampler\n",
"s=sampler.run_mcmc(p0,nstep)\n",
"\n",
"# print acceptance fraction which are stored in acceptance_fraction attribute and computed\n",
"# with get_autocorr_time() method\n",
"print('acceptance fraction: {:8.3f}'.format(sampler.acceptance_fraction[0]))\n",
"print('mean autocorrelation time: {:8.3f}'.format(np.mean(sampler.get_autocorr_time())))\n",
"\n",
"# the chains are saved in the chain attribute, with dimension (nwalkers, nstep, ndim)\n",
"print(sampler.chain.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig,ax=plt.subplots(ndim,2)\n",
"\n",
"# The flatchain attribute puts the chains from all of the walkers together\n",
"print(sampler.flatchain.shape)\n",
"\n",
"# the get_chain() method will discard initial points and \"thin\" the chain\n",
"nburn= # use autocorrelation to set\n",
"thin= # use autocorrelation to set\n",
"flat_samples = sampler.get_chain(discard=nburn, thin=thin, flat=True)\n",
"\n",
"# show the raw chains on the left, and the get_chain() results on the right\n",
"for idim in range(ndim) :\n",
" ax[idim,0].plot(sampler.flatchain[:,idim])\n",
" ax[idim,1].plot(flat_samples[:,idim])\n",
" \n",
"# corner plot\n",
"plot=corner.corner(flat_samples,plot_contours=True,\n",
" labels=['mean','variance','sigma'],show_titles=True)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Kernel density estimation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, let's consider kernel density estimation \n",
"We'll consider both 1D and 2D distributions.\n",
"\n",
"If we want a 1D marginal distribution, we just use the values from the axis in question: no need to explicitly marginalize, the sampling does that for us automatically!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start with a histogram, using several of the \"rules\" described for choosing number of bins. We can use either the chain results from our Metropolis run, or the flat_samples results from the emcee run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use matplotlib default histogram\n",
"fig,ax=plt.subplots(3,1)\n",
"for idim in range(3) :\n",
" ax[idim].hist( ... ) # sampled values of idim parameter\n",
"fig.tight_layout()\n",
"fig.suptitle('matplotlib default nbins')\n",
" \n",
"# use Scott's rule\n",
"fig,ax=plt.subplots(3,1)\n",
"for idim in range(3) :\n",
" ax[idim].hist( ... ,bins='scott') \n",
"fig.tight_layout()\n",
"fig.suptitle(\"Scott's rule\")\n",
"\n",
"# use Freedman-Diaconis rule\n",
"fig,ax=plt.subplots(3,1)\n",
"for idim in range(3) :\n",
" ax[idim].hist( ... ,bins='fd')\n",
"fig.tight_layout()\n",
"fig.suptitle('Freedman-Diaconis rule')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, now lets do a 2D histogram, say showing PDF for slope and intercept. Again, we just choose those two values of the sampled points to marginalize over sigam."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nbins= # number of bins\n",
"out=np.histogram2d(..., ... ,bins=nbins)\n",
"plt.imshow(out[0],origin='lower')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, now we get to real kernel density estimation, using scipy.status.gaussian_kde() \n",
"\n",
"You can use keyword bw_method= to set the bandwidth: the default is to use a value based on Scott's rule"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy.stats import gaussian_kde\n",
"print(chain.shape)\n",
"\n",
"#instatiate a 1D KDE estimator for the first parameter (slope)\n",
"t=gaussian_kde(chain[:,0].T)\n",
"\n",
"# choose points to evaluate it at, first a 1D \n",
"xplot=np.linspace(...,...100) # choose good range for slope\n",
"plt.figure()\n",
"plt.plot(xplot,t(xplot))\n",
"\n",
"# same for second parameter (intercept)\n",
"t=gaussian_kde( ... ) # specify the second parameter\n",
"yplot=np.linspace(...,...,100) # choose good range for intercept\n",
"plt.figure()\n",
"plt.plot(t(yplot),yplot)\n",
"\n",
"# 2D estimator for slope and intercept\n",
"t=gaussian_kde( ... ) # specify first and second parameters\n",
"# evaluate on 2D grid\n",
"out=np.zeros([100,100])\n",
"for ix,x in enumerate(xplot) :\n",
" for iy,y in enumerate(yplot) :\n",
" out[iy,ix] = t([x,y])\n",
"\n",
"plt.figure()\n",
"plt.imshow(out,origin='lower',extent=[0,1,0,10],aspect='auto')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have time, take a look at https://www.astroml.org/book_figures_1ed/chapter1/fig_S82_scatter_contour.html to see example of using greyscale/contours for dense regions in a plot, with individual points in low density regions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 4
}