\n", "Write a routine for the prior and for the log(prior):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#routine to return prior\n", "def prior(m,s) :\n", " \"\"\" return prior for input mean and standard deviation \"\"\"\n", " return # expression for prior\n", "\n", "def log_prior(m,s) :\n", " \"\"\" return log(prior)\"\"\"\n", " return # expression for log(prior)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now calculate the posterior probability for a grid of mean and standard deviation. Again, since the likelihood may be small and underflow, we'll work with the log of the posterior. If the posterior is the product of the likelihood and prior, then the log of the posterior is:\n", "$$log(P(M|D)) = log(L) + log(prior)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def log_posterior(y,m,s) :\n", " \"\"\" Calculate log(posterior) for given set of data and trial mean and standard deviation\n", " \"\"\"\n", " return # return expression for log(posterior) using your routines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, now let's calculate the log(posterior) over a range of trial means and standard deviations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# make a grid of L for various mean and sigma\n", "n=100 # we'll use 100 points in trial means and standard deviations (so, 10000 calculations of the posterior!)\n", "\n", "# set appropriate ranges for your trial mean and standard deviation and create a grid over which\n", "# you will compute the posterior\n", "trial_means=np.linspace(,,n) # set range of means to try from inspection of data (add start and end)\n", "trial_stds=np.linspace(,,n) # set range of standard deviations to try from inspection of data\n", "\n", "# set up a nxn grid to store the calculations\n", "log_Ppdf=np.zeros([n,n])\n", "\n", "# fill the grid\n", "for ix,m in enumerate(trial_means) :\n", " for iy,s in enumerate(trial_stds) :\n", " log_Ppdf[iy,ix] = # use your function for log(posterior) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalize the posterior PDF so that it sums to unity, and display it. To do so, we need to go back to likelihood, not log(likelihood) but, to avoid tbhe very small number problem, we'll add a constant to log(L) so that this isn't an issue; this is not a problem because we're just going to normalize it anyway." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#if you are using a log(Ppdf), convert back to Ppdf, but after adding a constant to avoid underflow\n", "Ppdf = np.exp(log_Ppdf - log_Ppdf.max())\n", "\n", "# normalize the likelihood to have an integral of unity. We can just use a sum because we don't really care about\n", "# the bin size, which will just introduce a scaling constant, since we're going to normalize it\n", "Ppdf /= # normalize by dividing by sum\n", "\n", "# display the posterior\n", "plt.imshow(Ppdf)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now determine the marginal distributions for the mean, and determine the \"1-sigma\" credible region by looking at the cumulative distribution function of the marginal distribution. For the marginalization, note that numpy.sum() has an axis= keyword that allows you to sum over just that axis, i.e. turning a 2D array into a 1D array (as opposed to a single number without the axis= keyword).\n", "\n", "You may want to use the [numpy.cumsum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html) function for the credible region calculation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get the marginal distribution function by marginalizing (summing) over the sigma axis\n", "mdf= # marginalize over the standard deviation\n", "\n", "# plot the marginal distribution\n", "plt.plot(mgrid,mdf)\n", "\n", "# get the cumulative distribution function of this to determined credible region\n", "cdf=mdf.cumsum()\n", "\n", "# determine the 1-sigma credible region, i.e. where cdf>0.16 and cdf<0.84\n", "m_16 = # mean value where cdf~0.16\n", "m_84 = # mean value where cdf~0.85\n", "\n", "#plot vertical lines for the credible region\n", "plt.plot([m_16,m_84],[0,0],color='r')\n", "\n", "# overplot the frequentist sample mean and its uncertainty for comparison\n", "plt.plot([sample_mean,sample_mean],plt.ylim(),color='g')\n", "plt.plot([sample_mean-sem,sample_mean-sem],plt.ylim(),color='g',ls=':')\n", "plt.plot([sample_mean+sem,sample_mean+sem],plt.ylim(),color='g',ls=':')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the frequentist perspective, the best estimate for the mean is the sample mean and the uncertainty of the mean is described by a Gaussian characterized by the standard error of the mean. How does the Bayesian posterior for the mean commpare with the frequentist perpsective?\n", "

\n", "

\n", "

\n", "