Introduction to CCD images and basic CCD data reduction

CCD introduction and principles of operations

Photoelectric effect in a semiconductor. Photons excite photoelectrons, which are kept localized by electronics on the chip. Note that "input" is number of photons, "output" is number of electrons, which are related by the sensitivity (quantum efficiency) of the pixel.

Charge sensing (readout) by charge transfer through the array, first vertically to the serial register, then horizontally into the readout electronics.

Electronics: multiply input electrons by a gain factor (to optimize dynamic range), add a bias level (to avoid negative input), convert to digital via an A/D converter. "Input" is electrons, "output" is counts (also known as DN, or ADU).

Note that the bias level can vary with time/temperature, so in general, the bias level must be measured on each individual exposure. This is typically achieved by reading several "dummy" pixels after each row is read, where these "dummy" pixels act to record the current bias level. This leads to a set of columns of "dummy" pixels at the right-hand edge of every image, called the overscan. The overscan is used to derive the bias value for the frame. (Note in some cases, the bias level can actually vary during the course of the readout, in which case more sophisticated handling is required).

The physical architecture of CCDs leads to specific terminology: rows, columns, serial registers, overscan, underscan.

Note pixel-to-pixel sensitivity variations, and variation of sensitivity with wavelength.

Basic calibration

Basic calibration: bias level subtraction (subtract out bias level that was added), flat field division (compensates for pixel-to-pixel sensitivity variations).

Other possible calibration: bias pattern subtraction, dark subtraction, shutter shading division, fringe correction

Calibration data

Calibration data: obtaining biases, darks, flats. Need for multiple exposures for noise reduction (biases and darks), outlier suppression (cosmic rays, stars in sky flats). Note issues with source of flat fields: how flat are they?

Creating calibration frames: combining images

creating superbias, superdark, flat field: combining images, including normalization for flats

The best estimator of parent population mean in a least-squares, maximum likelihood sense, is the sample mean. However, the sample mean is not especially robust in the case of outliers. Outliers occur in lots of astronomical contexts, e.g., cosmic rays, filtering of stars, or just bad data. Combining images while doing the best job of rejecting outliers is a critical part of many data reduction/analysis tasks.

Some more robust estimators: median, but it produces larger error of the mean (about 25% larger ( 1.253σ/ $\sqrt{{n}}$ for normal distribution) than does the mean. Often can make use of a priori knowledge about outliers: e.g., stars and CRs are always positive. This leads to routines like maximum-pixel rejection. However, this leads to biases for all pixels without outliers. Hence, a better technique is min-max rejection; even this still leads to biases in pixels which have an outlier, and really throws away signal on others. Probably the best bet is to do n-sigma rejection, then recomputation of the mean. Problem here is that estimator of sigma can be very biased in the presence of outliers; may work better if you compute both mean and variance from the sample with the maximum value removed). Alternatively, apply using error model; for example, compute sigma from the median value and a noise model, use this to reject outliers, then average the remaining data points . Be aware of issues trying to reject stars, e.g., in twilight flats: there will always be some point in the profile at which your rejection will fail if done on a pixel-by-pixel basis.

If you don't have exposures at a common intensity, you need to normalize and potentially, consider the effect of different noise levels.

Python tools

Basic techniques for image reduction in Python are just image arithmetic and statistics (e.g. mean() and std() methods of numpy arrays). To median images, stack them into a data cube, then use numpy.median(cube,axis=0) to median them together.

You can create a cube ``in advance'', using, e.g., cube=numpy.zeros(nim,nrow,ncol), and load using: cube[0,:,:]=im1, cube[1,:,:]=im2, etc
You can create a cube ``on the fly'', using, e.g., cube=numpy.array([im1,im2,im3])
med=numpy.median(cube,axis=0)

Exercises

Our goal is construct a simple data reduction package, consisting of a single Python or IDL file, and use it to reduce the SPICAM data in /home/holtz/raw/apo/dec06/UT061215. We want to make the program modular so that it could be used for other nights, and for images from other instruments.
- Python: use ipython –matplotlib,
Write a routine that returns the data from an input file name and an input directory
Write a routine that returns a bias-subtracted array given an input data array, and bias region
- Look at one of the SPICAM frames and determine a good region in the overscan to use for bias level determination
- Test your routine on a few files
Write a routine that will return a normalized input data array
- Look at one of the SPICAM frames and determine a good region to use for normalization
- Test your routine on a few of the flat field files
Write a routine that will return the median of an input list of data arrays
- Test your routine
Write a routine that will return a flat field from an input list of files
Construct flat fields for UT061215 in g, r, i (frames starting with flat_)
- Look at the flat fields. Do you understand all of what you are seeing in the images?
- After combining, compare each individual frame with the combined frame to confirm that differences are understood (what do you expect?)
- Compare the flat fields at different wavelengths. Do you think that the CCD pixels have sensitivities that are a function of wavelength?
Write a function that will return a bias-subtracted, flat fielded image from an input file name and an input flat field
Reduce the SN17135 frames with bias subtraction and flat fielding
- How good do you think the flat fields are? Consider this by make a range of image cross-section plots, both rows and columns at different locations.
Bias structure subtraction and superbiases: is this warranted for this data set?
- Inspect the series of bias frames. Do you think there is a stable pattern to subtract?
- Construct a superbias by combining the bias frames, and look for any structure in it.
- Compare the individual biases with the superbias (by subtraction) and determine whether there is stable structure
- Do you think superbias subtraction is warranted here?
Fringing: inspect the z band frames
Write a routine to reduce a list of images, writing out the reduced frames
Write a script to look through all of the images on a night, displaying them, pausing to let inspect each one before moving on.
- get image list
  - Python:
```
import glob 
files=glob.glob(\textit{directory}+'/*.fits'
```
- loop over files and display
  - Python: import pyds9; disp=pyds9.DS9(); for file in files: disp.set("fits "=file); pdb.set_trace()

.run {filename}

Program names and locations

As we've noted, IDL routines can be either procedures or functions. These are saved in disk in files, where the convention is to use the suffix .pro for IDL routines. A given file can contain more than one procedure and/or function, but in many cases, to allow routines to be automatically found, it is convenient to have a separate file for each procedure or function. Within a file, the minimum skeleton is:

pro name,{arguments},{keywords}
end

while functions must have

function name,{arguments},{keywords}
return,value
end

When you reference a procedure by name, IDL will search a series of directories to find a file that has the same name, and if it finds one, it will load/compile the routines in that file. The series of directories to search is specified before starting IDL by setting the IDL_PATH environment variable (this is a type of UNIX system variable). All directories in this path will be searched in order: if a directory name is prefixed with a '+', then all subdirectories under that directory will also be searched. As an example, here is the IDL_PATH variable that I set in our default login

setenv IDL\_PATH +/home/local/rsi/astrpro:+/home/local/rsi/idl71/lib:+/home/local/rsi/idlutils/:+/home/local/rsi/xidl/:+/home/local/rsi/idlspec2d/:+/home/local/rsi/Spextool2:+/home/local/rsi/Instruments

If you modify a program that has already been compiled, you won't see your changes until the program gets recompiled! You can force this using:

.com {filename}

Program execution, errors, etc.

When you run IDL programs, there is a series of "levels" in which things are run: the main command line is the top level. If you call a procedure, that executes in the next level, if that procedure calls another, it executes another level deep, etc.

Within each level, variables are local, i.e. if you use the same variable name at different levels and change the value at one level, it won't get changed in another level. There are two exceptions:

if the variable has been passed to the procedure/function by reference (which is how variables in an argument list, with a few exceptions, are passed, but NOT variables in parameter keywords), then its value will be changed in the calling routine.
if the variable is in a common block, that makes it accessible at all levels

As you write and use programs, you will find that sometimes things crash, i.e fail in the middle of program execution. When this happens, IDL will tell you where it crashed, i.e. the line number of the program that is running, and if it is in a level, what programs from higher levels it was called from.

When a crash happens, IDL leaves you in the level of the crash. This way, you can inspect variables, etc. at that level. However, you won't be able to see variables, etc. at higher levels. And if you try to run programs from there, things will just keep getting deeper and deeper.

The retall command returns you to the top level. Generally, you will probably always want to use this after a program crash.

Arguments and keywords

IDL procedures and functions can receive data from a higher level via program arguments and keywords. Arguments are passed in a pre-defined order on the calling line. Keywords are specified by the form name=value and are order-independent; the special form /name is equivalent to name=1.

Within a procedure/function, you can test for the existence of an argument using

if n_elements(argument) gt 0 then ...

and the presence of a keyword by

if keyword_set(name) then ....

System commands

spawn,command,result

Some examples

Simple program showing argument and structure usage.

function rd,file,exten
  if n_elements(exten) eq 0 then exten=0
  print,'reading ', file
  if not file_test(file) then begin
    print,'Cannot find file: ',file
    return,0
  endif
  im=mrdfits(file,exten,head,/silent)
  floatim=float(im)
  a = { hdr0:head, im0: floatim} 
  return,a
end

An atv wrapper when using a structure for images, showing usage of arguments and keywords

pro mtv,disp,zrange,zero,l=l,z=z
  common imagebuffers

  if not keyword_set(zrange) and not keyword_set(l) then begin
    s=size(disp.im0)
    sky,disp.im0[s(1)*0.25:s(1)*0.75,s(2)*0.25:s(2)*0.75],sky,skysig
    print,'sky: ',sky, ' skysig: ', skysig
    atv,disp.im0,header=disp.hdr0,/linear,min=sky-5*skysig,max=sky+10*skysig
  endif else begin
    if keyword_set(l) then zrange=l
    if keyword_set(z) then zero=z else zero=0
    atv,disp.im0,header=disp.hdr0,/linear,min=zero,max=zero+zrange
  endelse
end

Simple program showing pointer usage.

common imagebuffers,buffers

buffers=ptrarr(1000,/allocate_heap)

end

pro rdbuf,buffer,file
  common imagebuffers
  print,'reading ', file
  tmp=mrdfits(file,0,head,/silent)
  floatim=float(tmp)
  ;help,floatim
  *buffers[buffer] = { hdr0:head, im0: floatim, file: file} 
end

A more flexible mtv to allow for either variables or buffer numbers:

pro mtv,a,zrange,zero,l=l,z=z
  common imagebuffers

  type=gettype(a)
  if type eq 2 then disp=*buffers[a] else disp=a
  if not keyword_set(zrange) and not keyword_set(l) then begin
    s=size(disp.im0)
    mmm,disp.im0[s(1)*0.25:s(1)*0.75,s(2)*0.25:s(2)*0.75],sky,skysig
    print,'sky: ',sky, ' skysig: ', skysig
    atv,disp.im0,header=disp.hdr0,/linear,min=sky-5*skysig,max=sky+10*skysig
  endif else begin
    if keyword_set(l) then zrange=l
    if keyword_set(z) then zero=z else zero=0
    atv,disp.im0,header=disp.hdr0,/linear,min=zero,max=zero+zrange
  endelse
end

Examples showing power of using system commands

pro rdfiles,filenames

spawn,'"ls" '+filenames,files
for i=0,n_elements(files)-1 do begin
  rdbuf,i,files[i]
  print,i,files[i]
endfor

end

Want to look at each image? Try atv/mtv and atv_activate....