{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Bayesian dsitance from GAIA parallaxes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise, we will download some data from the GAIA archive on parallaxes of stars. The parallax, $p$, of a star is half the change in the angular direction of a star as seen from opposite points of the Earth's orbit, as shown here:\n", "\n", "\n", "\n", "\n", "It can be used to determine the distance to a star: simple geometry gives \n", "$$tan(p) = 1 / d$$\n", "where $d$ is the distance in units of the earth-Sun distance, which is called an astronomical unit (and is equal to $1.496 \\times 10^{13}$ cm\n", "\n", "Since stars are very far away, the parallax is always a very small angle, so\n", "$$tan(p) \\approx p = 1/d$$\n", "where $p$ is in radians.\n", "Astronomers have defined a distance unit called the parsec, such that the distance in parsecs is:\n", "$$d({\\rm in\\ parsec}) = 1 / p ({\\rm in\\ arcsec})$$\n", "where 1 arcsec = 1/3600. degree. With this definition, \n", "$$1 parsec = 1 astronomical unit \\times 3600 arcsec/degree \\times 180 / \\pi = 3.086\\times 10^{18} cm$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The GAIA mission is currently measuring parallaxes to billions of stars. They release data to the public through a relational database in the GAIA archive. We will download some data from the database, using a Python package called astroquery . Once astroquery has been installed in your Python distribution, we can import it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import astroquery" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Many quantities are available in the GAIA archive database. These are described by the database schema. You can find the schema for GAIA at the GAIA archive: click on the Search link at the top, and then the Advanced (ADQL) link, and finally expand the gaia.edr3.gaia_source description on the left by clicking the +. This will show you all of the information in this database table. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will query the database using the Astronomical Data Query Language (ADQL) which is a superset of the standard Structured Query Language (SQL). The basic format of queries is relatively simple:\n", "\n", "
\n",
    "SELECT selectList \n",
    "FROM fromClause \n",
    "[WHERE conditions] \n",
    "
\n", "\n", "Basically, the selectList gives the quantities you want to download, the fromClause gives the table where the quantities are located, and the optional WHERE clause gives some conditions for which objects you want to download.\n", "\n", "\n", "ADQL is an extension of SQL, in particular, defines some useful geometric functions, e.g. CONTAINS() can be used to select objects within some distance of some point in the sky, e.g.\n", "
\n",
    "WHERE 1=CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', tablera, tabledec, radius)\n",
    "
\n", "would select points where the database coordinate variables (tablera and tabledec, but you will need to use the appropriate column names) fall within some distance of the ra/dec specified in the query.\n", "\n", "For a little more information on ADQL/SQL see https://www.cosmos.esa.int/documents/915837/915858/ADQL_handson_slides.pdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Let's write a query to get the parallaxes of all stars within 0.5 degree of the north Galactic pole. Look at the schema to get the column names for ra, dec, and parallax, and try to complete the following SQL query. Give it a quick try, then move on to the next cell to see a valid query:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell gives and implements the query" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "from astroquery.gaia import Gaia\n", "from astropy.coordinates import SkyCoord\n", "import astropy.units as u\n", "\n", "# get RA/DEC at the Galactic pole, which has l=0, b=90\n", "coord = SkyCoord(0, 90, unit=(u.degree, u.degree), frame='galactic')\n", "ra=coord.icrs.ra.value\n", "dec=coord.icrs.dec.value\n", "\n", "# search radius\n", "rad=0.5\n", "\n", "#form the SQL query\n", "query=\"\"\"SELECT ra, dec, parallax, parallax_error\n", " FROM gaiaedr3.gaia_source\n", " WHERE 1=CONTAINS(\n", " POINT('ICRS', {:f}, {:f}),\n", " CIRCLE('ICRS',ra, dec, {:f}))\"\"\".format(ra,dec,rad)\n", "print(query)\n", "\n", "# run the query and get the results\n", "job = Gaia.launch_job_async(query)\n", "tab = job.get_results()\n", "print('Returned {:d} objects'.format(len(tab)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results are returned in an astropy table. You can display them in a convenient notebook interface. Individual columns can be referenced using tab['columnname'], e.g. tab['parallax']. If you are more familiar/comfortable with pandas data frame, tables can easily be converted to pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#display the results\n", "print('parallax of row 100: {:.3f}'.format(tab['parallax'][100]))\n", "tab.show_in_notebook(display_length=10)\n", "\n", "#if you prefer to convert to pandas, use the following\n", "#ptab = tab.to_pandas()\n", "#print(ptab['parallax'][100])\n", "#print(ptab)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's solve for Bayesian posterior probability distribution for distance to our stars. \n", "$$P(d|p) = {P(p|d)P(d)\\over P(p)}$$\n", "We will avoid any calculation of $P(p)$ and just normalize our posterior PDF.\n", "First, we need to formulate an expression for the likelihood, $P(p|d)$. For GAIA, the parallax uncertainties are expected to be distributed according to a normal distribution. For a gaussian uncertainty in measurement parallax, we then have:\n", "$$P(p|d) = {1\\over \\sigma_p\\sqrt{2\\pi}} \\exp{-0.5 (p-{1\\over d})^2 \\over \\sigma_p^2}$$\n", "Write a function that will calculate this likelihood:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def likelihood(parallax,parallax_err,distance) :\n", " \"\"\" Return likelihood of a parallax, given a distance and parallax uncertainty\n", " assume parallax in mas, and distance in kpc\n", " \"\"\"\n", " # return value of likelihood\n", " return " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have to decide what to use for a prior. One could use just a positivity constraint, which would lead to an improper prior (since the integral to infinity is infinite), or one could impose a maximum distance, perhaps motivated by the faint limit of gaia (around g=20) and the absolute magnitude of the faintest stars. \n", "
\n", "However, these would not account for the fact that in a cone search, such as we have done above, the volume element grows as $d^2$, so that could be added to the prior.\n", "
\n", "In addition, one might want to incorporate the knowledge that, looking perpendicular to the Galactic plane, the density of stars is known to drop off, roughly exponentially, let's say with a scale height of 0.5 kpc.\n", "
\n", " Bailer-Jones et al 2018 adopt a prior\n", "$$P(d) = d^2 \\exp(-d/L)$$\n", "to account for these considerations, where $L$ is a scale length that depends on Galactic longitude and latitude. Let's adopt this prior, with $L=0.5$ at the Galactic pole. Note that this is a proper prior, since the exponential drops off faster than the $d^2$ term.\n", "
\n", "Write a function to return the value of this prior:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def prior(distance,scalelength) :\n", " \"\"\" Return distance prior based on Bailer-Jones formulation\n", " \"\"\"\n", " return # return value of prior" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, now let's construct the posterior PDF. We'll input an array of distances at which to sample the PDF. Write a function to return the posterior PDF for a single star, given the measured parallax and parallax uncertainty:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "def posterior(p,perr,dist,l=0.5):\n", " \"\"\" Calculate posterior PDF P(d|p,perr) for 0\n", " ANSWER HERE: \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now write a routine to return some statistics derived from posterior PDF: the mean, mode, median, and 90% equal tailed credible region. Note the routine numpy.cumsum() for converting a PDF into a cumulative distribution function (CDF). You can then find the index where the CDF equals desired percentiles (e.g., 5, 50, and 95) and then get the distance at that index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pdb\n", "def stats(dist,pdf) :\n", " \"\"\" return some statistics of a PDF\n", " \"\"\"\n", " mean = # expression for mean distance\n", " mode = # expression for mode\n", "\n", " cdf=np.cumsum(pdf/pdf.sum())\n", " p5 = # expression for 5th percentile distance\n", " p50 = # expression for 50th percentile distance\n", " p95 = # expression for 95th percentile distance\n", " \n", " median = dist[p50]\n", " credible_interval = [dist[p5],dist[p95]]\n", "\n", " return mean, mode, median, credible_interval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a function to get the posterior PDF and its statistics for a given input record, and plot them" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def pdf_stats(data) :\n", " \"\"\" Calculate posterior PDF statistics given a single data record that has 'parallax' and 'parallax_error'\n", " \"\"\"\n", "\n", " if np.isfinite(data['parallax']) :\n", " pdf= # get posterior using your routine\n", " \n", " # plot the Bayesian posterior PDF as solid lines\n", " plt.plot(dist,pdf,color='k')\n", "\n", " mean, mode, median, credible_interval = # get statistics from posterior using your routine\n", " plt.plot([mean,mean],plt.ylim(),label='mean')\n", " plt.plot([mode,mode],plt.ylim(),label='mode')\n", " plt.plot([median,median],plt.ylim(),label='median')\n", " plt.plot(credible_interval,[0,0],label='credible interval')\n", "\n", " print(mean,mode,median,credible_interval)\n", " \n", " d,derr= # get the MLE distance and uncertainty\n", " plt.plot([d,d],plt.ylim(),label='MLE distance',ls=':')\n", " plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Choose some interesting individual stars from your plot above and plot the PDFs and their statistics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for star in [1,4,6] :\n", " plt.figure()\n", " pdf_stats(tab[star])\n", " plt.title('star {:d}'.format(star))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Discuss differences between the different statistics, and what you think are the most useful descriptive statistics.\n", "
\n", "ANSWER HERE: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 2 }