This information at http://astronomy.nmsu.edu/sdss_bootcamp/computing.html

Getting set up

To avoid having to make sure different software is all installed on different computers, we will use the central astronomy server. It's possible to install things on your laptop, but may take some time and effort: you'll have to discuss with your mentors whether you will work on your laptop or another computer. For remote usage:

Computer needs to connect to network: AggieAir wireless
Need to have SSH for remote login (Mac OS X: built-in; Windows: Putty
Need to have X server running on laptop (Mac OS X: Xquartz; Windows: VcXsrv? )

For standalone laptop:

Python distribution with many common packages: Anaconda (probably Python 2.7)
TOPCAT
emacs editor (Mac OS X: built-in; Windows: emacs? )

Network and remote login

ssh -Y visitor@astronomy.nmsu.edu

Unix

The operating system of a computer is the system software that runs that provides user interface to the computer hardware. These days, there are two main operating systems: Windows and Unix-like operating systems. Most astronomers work on computers that use a Unix-like operating system, at least for data analysis (maybe not for basic office-like stuff, paper writing). Mac OS X is a Unix-like operating system. Graphical vs text interface: most astronomers use text interface!

Terminal application (Mac OS X)
xterm (Unix/Linux)

File organization:

files and directories
- File names ( name.ext) : in astronomy, people generally do NOT use spaces in names, and keep file names meaningful, but compact!
- File extensions: various conventions, e.g. .txt, .dat, .fits, .tex, .html
- some simple shortcut: . means current directory, .. means one directory up, ~ means home directory
disks and partitions
- local and remote disks

Unix commands (italics mean you file in desired quantity, square brackets mean optional parameter):

pwd : (print working directory) shows what directory you are in
ls [file(s)] : lists files in current directory (or recursively with -R)
cat/more/less file(s): show contents of a file
mkdir dir: make a subdirectory under current directory
cd path : change directory
cp source destination : copy a file
mv source destination : move a file
df : show available partitions including how much free disk space on each

Unix help:

man command : get information about specified command, including command options
http://freeengineer.org/learnUNIXin10minutes.html
http://www.tutorialspoint.com/unix/unix-useful-commands.htm
linux.die.net/man/
http://astronomy.nmsu.edu/holtz/a575/unix.html

Practice:

log into astronomy: ssh -Y visitor@astronomy.nmsu.edu
what directory are you in?
create a subdirectory with your last name
copy a file into that subdirectory
figure out what filesystem your directory is on. How much space is currently available on that file system?
figure out what other filesystems are accessible. Are these on the local computer or a remote computer?
use man pages to figure out how to get df to show you free disk space in ``human-readable" format

Creating/editing files:

emacs : powerful common editor
vi : historical main Unix editor
nano : simple editor
Textedit (Mac OS X); Notepad (Windows); gedit (Linux) : not generally used (not available on Unix machines)

Running programs:

Programs are run by typing the name of the program
the location(s) through which directories are searched to find the program is determined by the users PATH
- the path is stored in a special UNIX kind of variable called an environment variable
- you can set this using: setenv PATH dir1:dir2:dir3:...
- you can add to this using: setenv PATH ${PATH}:dir1:dir2:dir3:...
- path is searched in order

More UNIX utility commands:

grep string file(s) : search for string in files
awk : process files line-by-line
sed : line-by-line scripting editor
sort [-n] [--key={columnnumber}] : sorts files on a specifed column value

Practice:

Create a file that has your name in it, and the names of as many people in the bootcamp you can remember in one column, and some number in a second column
Look at the contents of the file using more
sort the file on either column using sort
I wrote a simple program, called hello, that is located in the home directory of the visitor acount
- Find it
- Run it. Can you run it from your subdirectory?
- add the home directory to the path

Shells and shell scripts:

Concept of a shell
Most common shells: bash and tcsh
Can do simple scripting using shells (although this often can also be done using scripting languages)
you can run a shell script by:
- source scriptname
- make it executable (chmod +x scriptname), then run it like a program (with full path name, or by putting directory in the path)

Customizing your UNIX environment:

shell startup scripts: .bashrc and .cshrc
- alias shortname command
- setenv environmentvariablename value

Practice:

Make your own shell script in your own subdirectory , which creates a alias to cd into your directory (from anywhere), and adds your directory to the path.

Databases and SQL

Databases are a convenient way to store large quantities of information, and to cross reference different types of information. The SDSS provides a database with many of the derived quantities from the different SDSS surveys, and provides some tools to make accessing these databases fairly easy.

For SDSS, the main database is the Catalog Archive Server (CAS), which compiles derived quantities from the SDSS data (not images or spectra, but quantities that are derived from them: for example, ....?

Database structure and organization: the database schema
- Let's look at the DR12 schema . Note Tables and Views.
- Some key tables:
  - APOGEE: apogeeVisit, apogeeStar, aspcapStar
  - SDSS/BOSS: specObj (which is a view)
  - MaNGA: mangaDrpAll (but only in DR13Collab database! and only target information)
Basic database queries using SQL
- SELECT what you want
- FROM table(s) in which it is stored
- (optionally) WHERE certain conditions are met
- Let's look at a relatively simple query
SQL output can be directed into a file
For SDSS, databases are stored centrally, and web tools are used, but databases can be located whereever, and there are multiple tools for querying them (command line and graphical)

More advanced queries can extract information from multiple tables, joining them on specified conditions; this is where the power of a relational database really comes in

JOIN ON table1.yyy = table2.xxx
Some typical SDSS joins:
- SDSS/BOSS : join specobj and photoobj
- APOGEE : join apogeeStar, aspcapStar, apogeeObject
Note example queries on SDSS web site, and also in casJobs interface!

Accessing the CAS database:

SkyServer
Database access in left column:
- some simple searches that make the SQL for you: SQS (spectroscopic query search), IRSQS (IR spectroscopy query search)
- general SQL query form: SqlSearch
- interface for larger queries: casJobs

Practice:

Make some SDSS query:
- get at least four output numerical quantities, possibilities:
  - SDSS/BOSS: galaxies within some region of the sky, get RA/DEC/redshift velocity dispersion from specobj, magnitudes from PhotoObj
  - APOGEE: stars within some region of the sky, get RA/DEC/nvisits/radial velocity/vscatter from apogeeStar, stellar parameters/abundances from aspcapStar
- start with HTML query, then get both CSV and FITS output files on your computer
- Copy your output csv and fits files to your subdirectory on the astronomy server:
```
scp {filename} visitor@astronomy.nmsu.edu:{dirname}/
```

Data files

text files: create/inspect with standard editor. Columns separated by some delineator:
- spaces
- commas (CSV files)
binary files: more compact, but not inspectable with an editor, need to use a program to read
- FITS files ( name.fits)
  - image files
  - tables

Interactive languages and plotting

Much of astronomical analysis involves making plots of tabulated data. This data may be stored in a database, or it may be stored in files. Various tools used by astronomers:

Python
IDL
TOPCAT
others!

Pros and cons, REU mentors.

Python

Python is a programming language that has well-developed package for making plots (matplotlib)

Ways to run Python:

Interactively: strongly recommend using ipython (enhanced interactive python interface), to enable automatic plot display use: ipython --matplotlib
Write python commands in a file (standard extension .py)
- Run the commands: python file.py
- While running ipython: %run file.py or import file.py
iPython notebooks: allow you to intersperse text and python, including execution of code

Some very simple Python:

print (intrinsically in interactive mode!)
arithmetic; +, -, *, /, ** (be careful about variable types and truncation!)
variables : single values, lists
arrays (though numpy)
- initialize array using numpy.zeros([ny,nx]) (after import numpy) or numpy.ones([ny,nx])
- turn a list into an array using numpy.array(listvar)
- array aritmetic is implicit: if one does arithmetic between a constant and an array, it applies to each element automatically; if one does arithmetic between arrays, it does it element-by-element (if the shapes don't match, then it generates an error!)
strings: concatenation, strip, etc.
variables as objects: they have attributes and methods. In ipython, use {object}. to view all of the possibilities
control statements: if / else
- Note that indentation defines the blocks!
loops: for
functions: used to avoid repetition of code for repeated calculations, also for code organization and readability, example:
```
def addconst(a,const=1) :
    return a+const

addone(5)
```
- Note that functions can return values that can be assigned to variables
- functions can take arguments, either positional or by keyword; keyword arguments can have default values
Advantages of writing code in a file, rather than just doing it at command line
- it can be saved and rerun!
- it can be commented
  - # at the beginning of the line indicates a comment
  - text between triple-quotes (''' or """) allows for multi-line comments
  - all functions should have docstrings between triple-quotes!

Practice

Start ipython interactively: ipython
Define some numerical variables, and do some arithmetic with them
Define some string variables, and do some operations with them
Write some operations in a file, and then execute the file from the command line, and from within ipython

Using pre-existing python packages/modules:

a module is a collection of routines (functions); a package may be a single module, but may also contain multiple modules
import module
import module as shortname
from packagename import module
Finding modules
- system-installed modules
- user modules: PYTHONPATH environment variable
Frequently used modules:
- matplotlib : plotting routines
- numpy : numerical functions, provides array functionality and operations
- scipy : many scientific analysis routines
- astropy : rapidly developing set of routines for astronomical analysis/calculation
Convenience feature of ipython: startup files in ~/.ipython/profile_default/startup

Reading data files in Python:

text files: using astropy.io.ascii or numpy.loadtxt
- from astropy.io import ascii
- var = ascii.read('filename')
- import numpy as np
- var = np.loadtxt('filename')
FITS files: using astropy.io.fits
- from astropy.io import fits
- var = fits.open('filename')
- FITS files may have multiple extensions, so this gives a FITS ``header-data unit (HDU)'' for each dimension
- Each HDU has a header (.header) and data (.data)
Both routines return numpy structured arrays (for FITS tables)
- for a given column name, access values using var['colname']
- see columns automatically in ascii tables by typing variable name, or for either using columns method on returned object: var.columns
Very useful function: numpy.where
FITS image files
- For wavelength calibrated spectra with uniform sampling, wavelength information may be in header, e.g. CRVAL1/CDELT1/NAXIS1 or for BOSS spectra, COEFF0/COEFF1/NAXIS1: wave=var.header['CRVAL1']+var.header['CDELT1']*np.arange(var.header['NAXIS1'])
- if uniform sampling is on log10(wavelength) scale: wave=10**wave

Practice:

Using the data files you got from your SQL queries, read them into Python variables, both csv files and fits files

Python plotting:

remember, for automatic plot display, start ipython with --matplotlib!
import matplotlib.pyplot as plt
Simple plotting and overplotting:
- plt.plot(x,y[,pointstyle]) (with pointstyle, draws points, otherwise connects points)
  - example pointstyles give color ([krgbcym]) and shape ([o+.^v<>sph]) : 'ro', 'g+', 'b.', 'c^', 'yv', ...
- plt.plot(x,z[,pointstyle])
  - if plot window doesn't automatically show, restart ipython with --matplotlib, or use plt.show() to see plot
- plt.clf() # to clear the plot
- Using matplotlib window functions from the icons
- Customizing plots with limits and labels:
  - plt.xlim(xmin,xmax) # x limits
  - plt.ylim(ymin,ymax) # y limits
  - plt.xlabel(xlabel)
  - plt.ylabel(xlabel)
  - plt.text(x,y,text) # add text at arbitrary location
  - plt.title(title) # put a title on the plot
Subwindows
- plt.clf()
- plt.subplot(1,2,1) # (ny, ny, id)
- plt.plot(x,y)
- plt.subplot(1,2,2)
- plt.plot(x,z)
More sophisticated plot interface, allows for multiple figures to be open at a time, more control
- fig=plt.figure()
- ax1=fig.add_subplot(1,2,1) (shorthand: fig.add_subplot(121)) # note difference in function name from above
- ax1.plot(x,y)
- ax1.set_xlim(xmin,xmax) # note difference in function name from above
- ax1.set_ylim(ymin,ymax) # note difference in function name from above
- ax2=fig.add_subplot(1,2,2)
- ax2.plot(x,y)
- ax2.cla # to clear an axis
- fig.tight_layout() # helps if you have things that overlap between plots
- plt.draw()
Packing more information into your plots: point colors and point sizes
- scatter: allows you to code points by color and/or size, e.g.
```
   ax.scatter(x,y,c=z,vmin=zlo,vmax=zhi,size=t)
   ax.colorbar()
```
  will plot x vs y, color points according to z, size points by values in t, add colorbar

Practice:

Plot up your 4 dimensional data using point color and size!

IDL

Running IDL

Interactive use
Writing IDL procedures and running them

IDL Help

Simple IDL

print
variables and arrays
- initialize arrays using var=fltarr(nx,ny) (or intarr)
arithmetic
strings
conditionals
loops
functions (return a value) and procedures (don't return value, but can modify arguments)

Practice:

Start IDL: idl
Define some numerical variables, and do some arithmetic with them
Define some string variables, and do some operations with them
Write some operations in a file, and then execute the file from the command line, and from within ipython

Using pre-existing IDL routines:

Astronomy users library
SDSS tools: idlutils, idlspec2d
Finding routines: IDL_PATH environment variable

Reading data files in IDL:

IDL Users library readcol routine (library must be in IDL_PATH!)
- each column is read into an array variable
- readcol,'filename',c1,c2,c3,c4,c5,c6[,format=('fffff') (variable names for columsn can be anything you want, formats are fda for float (default), integer, character)
IDL Users library mrdfits routine (library must be in IDL_PATH!)
- var=mrdfits('filename'[,ext,head=head]) reads data from specified extension into var
- table data is read into a structure variable, access elements using e.g. var.colname
- help,var to see the columns

Practice:

As above for Python, but IDL

Simple plotting and overplotting:

plot,x,y[,xtit=xtitle,ytit=ytitle,xr=[xmin,max],yr=[ymin,ymax]]
oplot

More advanced

plots command allows you to specify color (between 0-255) and size of a point
```
  plots,x,y,color=(z-zmin)*(zmax-zmin),size=size
```
so far as I know, for color-coded and/or size-coded points, need to plot one point at a time in a loop over array elements:

Practice:

As above for Python, but IDL

TOPCAT

Table manipulation tool and plotter Can install on Mac OS X laptop:

TOPCAT for Mac OS X Simple usage:

Load a table
Plot columns against each other
- Code points by size (color?)
Histogram