This information at http://astronomy.nmsu.edu/sdss_bootcamp/computing.html
Getting set up
To avoid having to make sure different software is all installed on
different computers, we will use the central astronomy server. It's
possible to install things on your laptop, but may take some time and
effort: you'll have to discuss with your mentors whether you will work
on your laptop or another computer.
For remote usage:
- Computer needs to connect to network: AggieAir wireless
- Need to have SSH for remote login (Mac OS X: built-in; Windows: Putty
- Need to have X server running on laptop (Mac OS X: Xquartz; Windows: VcXsrv? )
For standalone laptop:
- Python distribution with many common packages: Anaconda (probably Python 2.7)
- TOPCAT
- emacs editor (Mac OS X: built-in; Windows: emacs? )
Network and remote login
ssh -Y visitor@astronomy.nmsu.edu
Unix
The operating system of a computer is the system software that runs that provides user interface to the computer hardware. These days, there are two main operating systems: Windows and Unix-like operating systems. Most astronomers work on computers that use a Unix-like operating system, at least for data analysis (maybe not for basic office-like stuff, paper writing). Mac OS X is a Unix-like operating system.
Graphical vs text interface: most astronomers use text interface!
- Terminal application (Mac OS X)
- xterm (Unix/Linux)
File organization:
- files and directories
- File names ( name.ext) : in astronomy, people generally do NOT use spaces in names, and keep file names meaningful, but compact!
- File extensions: various conventions, e.g. .txt, .dat, .fits, .tex, .html
- some simple shortcut: . means current directory, .. means one directory up, ~ means home directory
- disks and partitions
Unix commands (italics mean you file in desired quantity, square brackets mean optional parameter):
- pwd : (print working directory) shows what directory you are in
- ls [file(s)] : lists files in current directory (or recursively with -R)
- cat/more/less file(s): show contents of a file
- mkdir dir: make a subdirectory under current directory
- cd path : change directory
- cp source destination : copy a file
- mv source destination : move a file
- df : show available partitions including how much free disk space on each
Unix help:
Practice:
- log into astronomy: ssh -Y visitor@astronomy.nmsu.edu
- what directory are you in?
- create a subdirectory with your last name
- copy a file into that subdirectory
- figure out what filesystem your directory is on. How much space is currently available on that file system?
- figure out what other filesystems are accessible. Are these on the local computer or a remote computer?
- use man pages to figure out how to get df to show you free disk space in ``human-readable" format
Creating/editing files:
- emacs : powerful common editor
- vi : historical main Unix editor
- nano : simple editor
- Textedit (Mac OS X); Notepad (Windows); gedit (Linux) : not generally used (not available on Unix machines)
Running programs:
- Programs are run by typing the name of the program
- the location(s) through which directories are searched to find the program is determined by the users PATH
- the path is stored in a special UNIX kind of variable called an environment variable
- you can set this using: setenv PATH dir1:dir2:dir3:...
- you can add to this using: setenv PATH ${PATH}:dir1:dir2:dir3:...
- path is searched in order
More UNIX utility commands:
- grep string file(s) : search for string in files
- awk : process files line-by-line
- sed : line-by-line scripting editor
- sort [-n] [--key={columnnumber}] : sorts files on a specifed column value
Practice:
- Create a file that has your name in it, and the names of as many people in the bootcamp you can remember in one column, and some
number in a second column
- Look at the contents of the file using more
- sort the file on either column using sort
- I wrote a simple program, called hello, that is located in the home directory of the visitor acount
- Find it
- Run it. Can you run it from your subdirectory?
- add the home directory to the path
Shells and shell scripts:
- Concept of a shell
- Most common shells: bash and tcsh
- Can do simple scripting using shells (although this often can also
be done using scripting languages)
- you can run a shell script by:
- source scriptname
- make it executable (chmod +x scriptname), then run it like a program (with full path name, or by
putting directory in the path)
Customizing your UNIX environment:
- shell startup scripts: .bashrc and .cshrc
- alias shortname command
- setenv environmentvariablename value
Practice:
- Make your own shell script in your own subdirectory , which creates a alias to cd into your directory
(from anywhere), and adds your directory to the path.
Databases and SQL
Databases are a convenient way to store large quantities of information,
and to cross reference different types of information. The SDSS provides
a database with many of the derived quantities from the different SDSS
surveys, and provides some tools to make accessing these databases fairly
easy.
For SDSS, the main database is the Catalog Archive Server (CAS), which
compiles derived quantities from the SDSS data (not images or spectra,
but quantities that are derived from them: for example, ....?
- Database structure and organization: the database schema
- Let's look at the DR12 schema . Note Tables and Views.
- Some key tables:
- APOGEE: apogeeVisit, apogeeStar, aspcapStar
- SDSS/BOSS: specObj (which is a view)
- MaNGA: mangaDrpAll (but only in DR13Collab database! and only target information)
- Basic database queries using SQL
- SELECT what you want
- FROM table(s) in which it is stored
- (optionally) WHERE certain conditions are met
- Let's look at a relatively simple query
- SQL output can be directed into a file
- For SDSS, databases are stored centrally, and web tools are used,
but databases can be located whereever, and there are multiple tools for querying them (command line and graphical)
More advanced queries can extract information from multiple tables,
joining them on specified conditions; this is where the power of a relational
database really comes in
- JOIN ON table1.yyy = table2.xxx
- Some typical SDSS joins:
- SDSS/BOSS : join specobj and photoobj
- APOGEE : join apogeeStar, aspcapStar, apogeeObject
- Note example queries on SDSS web site, and also in casJobs interface!
Accessing the CAS database:
- SkyServer
- Database access in left column:
- some simple searches that make the SQL for you: SQS (spectroscopic query search), IRSQS (IR spectroscopy query search)
- general SQL query form: SqlSearch
- interface for larger queries: casJobs
Practice:
Data files
- text files: create/inspect with standard editor.
Columns separated by some delineator:
- spaces
- commas (CSV files)
- binary files: more compact, but not inspectable with an editor, need
to use a program to read
Interactive languages and plotting
Much of astronomical analysis involves making plots of tabulated data.
This data may be stored in a database, or it may be stored in files.
Various tools used by astronomers:
- Python
- IDL
- TOPCAT
- others!
Pros and cons, REU mentors.
Python
Python is a programming language that has well-developed package for
making plots (matplotlib)
Ways to run Python:
- Interactively: strongly recommend using ipython (enhanced interactive
python interface), to enable automatic plot display use: ipython --matplotlib
- Write python commands in a file (standard extension .py)
- Run the commands: python file.py
- While running ipython: %run file.py or import file.py
- iPython notebooks: allow you to intersperse text and python, including execution of code
Some very simple Python:
- print (intrinsically in interactive mode!)
- arithmetic; +, -, *, /, ** (be careful about variable types and truncation!)
- variables : single values, lists
- arrays (though numpy)
- initialize array using numpy.zeros([ny,nx]) (after import numpy) or numpy.ones([ny,nx])
- turn a list into an array using numpy.array(listvar)
- array aritmetic is implicit: if one does arithmetic between a constant and an array, it applies to each element automatically; if one does arithmetic between arrays, it does it element-by-element (if the shapes don't match, then it generates an error!)
- strings: concatenation, strip, etc.
- variables as objects: they have attributes and methods. In ipython, use {object}. to view all of the possibilities
- control statements: if / else
- Note that indentation defines the blocks!
- loops: for
- functions: used to avoid repetition of code for repeated calculations, also for code organization and readability, example:
def addconst(a,const=1) :
return a+const
addone(5)
- Note that functions can return values that can be assigned to variables
- functions can take arguments, either positional or by keyword; keyword arguments can have default values
- Advantages of writing code in a file, rather than just doing it at command line
- it can be saved and rerun!
- it can be commented
- # at the beginning of the line indicates a comment
- text between triple-quotes (''' or """) allows for multi-line comments
- all functions should have docstrings between triple-quotes!
Practice
- Start ipython interactively: ipython
- Define some numerical variables, and do some arithmetic with them
- Define some string variables, and do some operations with them
- Write some operations in a file, and then execute the file from the command line, and from within ipython
Using pre-existing python packages/modules:
- a module is a collection of routines (functions); a package may be a single module, but may also contain multiple modules
- import module
- import module as shortname
- from packagename import module
- Finding modules
- system-installed modules
- user modules: PYTHONPATH environment variable
- Frequently used modules:
- matplotlib : plotting routines
- numpy : numerical functions, provides array functionality and operations
- scipy : many scientific analysis routines
- astropy : rapidly developing set of routines for astronomical analysis/calculation
- Convenience feature of ipython: startup files in ~/.ipython/profile_default/startup
Reading data files in Python:
- text files: using astropy.io.ascii or numpy.loadtxt
- from astropy.io import ascii
- var = ascii.read('filename')
- import numpy as np
- var = np.loadtxt('filename')
- FITS files: using astropy.io.fits
- from astropy.io import fits
- var = fits.open('filename')
- FITS files may have multiple extensions, so this gives a FITS ``header-data unit (HDU)'' for each dimension
- Each HDU has a header (.header) and data (.data)
- Both routines return numpy structured arrays (for FITS tables)
- for a given column name, access values using var['colname']
- see columns automatically in ascii tables by typing variable name, or for either using columns method on returned object:
var.columns
- Very useful function: numpy.where
- FITS image files
- For wavelength calibrated spectra with uniform sampling, wavelength information may be in header, e.g. CRVAL1/CDELT1/NAXIS1 or for BOSS spectra, COEFF0/COEFF1/NAXIS1: wave=var.header['CRVAL1']+var.header['CDELT1']*np.arange(var.header['NAXIS1'])
- if uniform sampling is on log10(wavelength) scale: wave=10**wave
Practice:
- Using the data files you got from your SQL queries, read them into Python variables, both csv files and fits files
Python plotting:
- remember, for automatic plot display, start ipython with --matplotlib!
- import matplotlib.pyplot as plt
- Simple plotting and overplotting:
- plt.plot(x,y[,pointstyle]) (with pointstyle, draws points, otherwise connects points)
- example pointstyles give color ([krgbcym]) and shape ([o+.^v<>sph]) : 'ro', 'g+', 'b.', 'c^', 'yv', ...
- plt.plot(x,z[,pointstyle])
- if plot window doesn't automatically show, restart ipython with --matplotlib, or use plt.show() to see plot
- plt.clf() # to clear the plot
- Using matplotlib window functions from the icons
- Customizing plots with limits and labels:
- plt.xlim(xmin,xmax) # x limits
- plt.ylim(ymin,ymax) # y limits
- plt.xlabel(xlabel)
- plt.ylabel(xlabel)
- plt.text(x,y,text) # add text at arbitrary location
- plt.title(title) # put a title on the plot
- Subwindows
- plt.clf()
- plt.subplot(1,2,1) # (ny, ny, id)
- plt.plot(x,y)
- plt.subplot(1,2,2)
- plt.plot(x,z)
- More sophisticated plot interface, allows for multiple figures to be open at a time, more control
- fig=plt.figure()
- ax1=fig.add_subplot(1,2,1) (shorthand: fig.add_subplot(121)) # note difference in function name from above
- ax1.plot(x,y)
- ax1.set_xlim(xmin,xmax) # note difference in function name from above
- ax1.set_ylim(ymin,ymax) # note difference in function name from above
- ax2=fig.add_subplot(1,2,2)
- ax2.plot(x,y)
- ax2.cla # to clear an axis
- fig.tight_layout() # helps if you have things that overlap between plots
- plt.draw()
- Packing more information into your plots: point colors and point sizes
- scatter: allows you to code points by color and/or size, e.g.
ax.scatter(x,y,c=z,vmin=zlo,vmax=zhi,size=t)
ax.colorbar()
will plot x vs y, color points according to z, size points by values in t, add colorbar
Practice:
- Plot up your 4 dimensional data using point color and size!
IDL
Running IDL
- Interactive use
- Writing IDL procedures and running them
IDL Help
Simple IDL
- print
- variables and arrays
- initialize arrays using var=fltarr(nx,ny) (or intarr)
- arithmetic
- strings
- conditionals
- loops
- functions (return a value) and procedures (don't return value, but can modify arguments)
Practice:
- Start IDL: idl
- Define some numerical variables, and do some arithmetic with them
- Define some string variables, and do some operations with them
- Write some operations in a file, and then execute the file from the command line, and from within ipython
Using pre-existing IDL routines:
Reading data files in IDL:
- IDL Users library readcol routine (library must be in IDL_PATH!)
- each column is read into an array variable
- readcol,'filename',c1,c2,c3,c4,c5,c6[,format=('fffff') (variable names for columsn can be anything you want,
formats are fda for float (default), integer, character)
- IDL Users library mrdfits routine (library must be in IDL_PATH!)
- var=mrdfits('filename'[,ext,head=head]) reads data from specified extension into var
- table data is read into a structure variable, access elements using e.g. var.colname
- help,var to see the columns
Practice:
- As above for Python, but IDL
Simple plotting and overplotting:
- plot,x,y[,xtit=xtitle,ytit=ytitle,xr=[xmin,max],yr=[ymin,ymax]]
- oplot
More advanced
- plots command allows you to specify color (between 0-255) and size of a point
plots,x,y,color=(z-zmin)*(zmax-zmin),size=size
- so far as I know, for color-coded and/or size-coded points, need to plot one point at a time in a loop over array elements:
Practice:
- As above for Python, but IDL
TOPCAT
Table manipulation tool and plotter
Can install on Mac OS X laptop:
TOPCAT for Mac OS X
Simple usage:
- Load a table
- Plot columns against each other
- Code points by size (color?)
- Histogram