Next: Plotting
Up: AY575 class notes
Previous: Presentation/communication
Subsections
- computing/software is lifelong learning!
- do as little as possible "interactively", so it is repeatable
- less is more (to a point) - don't repeat work, make the tool better and
reusable
- all software should be written as if someone else will use it: document
and share
From Quora, response to a question about value of formal training
in programming vs. picking it up on your own:
Upvoted by Alagunarayanan Narayanan, Software Engineer
" I was a self-taught programmer for about 4 years, before taking a
few courses as extras in my EE degree. I am now pursuing a MSc in
CS. So my answer is mostly observations on how I do things differently
now compared to how I did them before.
Architecting projects is a big thing. Given a list of things the
software is required to do, how would you lay out your program so
that it -
- Does what it's supposed to do
- Is maintainable
- Is easily understandable for other people (and yourself in a few months)
- Is easily extensible if the requirements change (as they always do)
- Uses the most suitable design patterns to make the code intuitive
Usually self-taught programmers do 1., but don't pay nearly as much
attention, or only pay lip service to the other points. I thought
I knew about all those other points, but as I found out later, I
really had no idea.
Another big difference is how much trial and error. I did a lot
more trial and error back in the days, because I didn't have good
understanding about how "the whole stack" works. I used a lot of
things without bothering to truly understand how they work under
the hood.
Nowadays I almost don't do any trial and error at all. I think
through everything, and most things I actually start coding actually
work on the first try (not counting typos, etc). I know exactly how
memory management works all the way from kernel to malloc. I know
how schedulers work (having written a few for a course), and I know
what the synchronization mechanisms are, the pros and cons of each,
the usual patterns in which they are used, as well as how they are
actually implemented on the instruction level. As a self-taught
programmer, I knew how to use mutexes and that's about it. As it
turned out, I actually (poorly) reinvented a few of the other
standard mechanisms.
Self-taught programmers are also usually not used to reading a lot
of code written by other people, which is a very important skill
when working in teams.
Knowledge of the existing algorithms is another one.
When a non-trivial problem is encountered, a bad programmer dives
head first into coding a solution. A better programmer looks for
solutions, and tries them out. A good programmer looks for solutions,
analyzes them for time and space complexity as well as other
constraints, and implements the most likely one.
Most self-taught programmers start coding too early.
And like you said, things like AI and ML. Most self-taught programmers
never learn those things, because usually they only learn things
they need, and if you don't know AI and ML, they won't seem like
possible solutions to your problems, and you'll never think about
learning them. It's a chicken and eggs problem. Self-taught programmers
often don't know what they don't know. One nice thing about doing
a degree is that it almost forcefully introduces you to everything,
so by the time you are done, at least you know what you don't
know."
compiled vs. non-compiled. Distinction perhaps less clear now than it was
in the past. But key point is that, for certain applications, you may want
to consider execution time.
open-source vs. proprietary
Languages: commonly used in astronomy: Python, C, C++, Fortran, IDL, MATLAB.
Many other languages exist: C++, C#, Java, Javscript, etc. etc.
Fortran: historical language of choice of scientific computing in mid/late
20th century. Many codes developed that are still in existence/development
(e.g., Anatoly's N-body and Hydro codes, synthetic spectra generation
MOOG and TURBOSPEC, stellar evolution MESA, Chris' codes, others...). In
addition, many routines coded for fast operation available, e.g. LAPACK.
Various versions of fortran: F66, F77, F90/F95
C: foundational language for computer science. Some astronomical
routines: SAO WCSTOOLS for astronomical coordinate system routines,
HSTPHOT/DOLPHOT for HST photometry, SLALIB for more astronomical
coordinate routines, others....
Python: major push in current astronomical software development. astropy
IDL: historical very close connection to astronomy, with much development
at LASP and Goddard Space Flight Center in the public domain. Marketed
through RSI, then ITT, now Exelis, as a licensed software product. Originally
written in C? Major astronomical software infrastructure exists in
IDL, e.g. for planetary and solar astrophysics (perhaps because of
NASA / GSFC). Astronomy users library. Major component of SDSS software,
although modern evolution away from it.
see programming reference table
Let's do a simple program in all languages.
Basic program structure:
- program statement
- variable declarations
- program statements
- end
Basic language:
- f77 starts in column 7, f95 can start in column 1.
- Comments: f95: exclamation (!), f77: C in column 1
- line continuation: f95 end line with &
Compiling and running programs: compiling, linking, makefile, BINDIR, path
Basic program structure:
- #include files
- int main() {
- variable declarations
- program statements
- }
Comments: double-slash (//), or embed between /* and */
Compiling and running programs: compiling, linking, makefile, BINDIR, path
Introduction to makefiles (see, e.g.,
the GNU documentations
or someone else's explanations.
Explicit commands, variable filenames, rules.
Defining rules, e.g. Latex into PDF, etc
Standard rules.
Standard targets: install, objs, etc.
Using command interpreter: python vs ipython (use ipython so you can access
its features!)
Using programs:
running interactively, running as a script, running as an executable, using %run with ipython
PATH and PYTHONPATH environment variables: executables are searched for in
PATH, Python imports are searched for in PYTHONPATH and system-installed
libraries.
Comments: hash (#).
Python useful references:
Python tutorial
python4astronomers
are searched for in
PATH, Python imports are searched for in PYTHONPATH and system-installed
libraries.
Using command interpreter.
Using programs: running a script using @, running a program using .run,
idl -e
IDL_PATH
Comments: semicolon (;)
Simple makefile (note that indents must be TABs):
hello_f:
f95 -c hello_f.f95
f95 -o hello_f hello_o
hello_c:
cc -c hello_c.c
cc -o hello_c hello_c.o
run:
hello_f
hello_c
hello.py
idl -e ".run hello.py"
see programming reference table
for syntactical details in each language.
- basic program structure
- comments and good commenting practice
- line continuation
- variables and memory, declaration and initialization.
- Basic variable types: integer, float, string.
Determining variable type in Python (type) and IDL (size,/type).
- Type conversion
- arrays and array operation: order of elements in multidimensional array
(column-major vs row-major). Array as a continuous stretch in memory.
- pointers: addresses vs values
- dynamic memory allocation: allocate and malloc
- multitype collections: structures (derived data types). Arrays of
structures. Python lists, tuples, dictionaries, and structured arrays
- mathematical
- string
- Bitmasks and logical operators.
- Vector operators.
- conditional: if/then/else
- control loops: for and while
- I/O : formatted output.
- I/O : simple input. Reading unknown length files in Fortran and C
- I/O : higher level routines: Python astropy.io.ascii, numpy.loadtxt
(but note this only reads a single data type into an array, rather than
into a structured array, as astropy.io.ascii does). IDL read_ascii
- Binary data: note issues of N-dimensional array unwrapping, byte order
- Fortran: OPEN(lun,file,FORM='binary'), READ/WRITE without format statements
- C: fopen(), fread(), fwrite()
- Python: open files in binary mode (rb, wb), struct.pack and struct.unpack to convert to binary data
- IDL: READ_BINARY
- FITS files
- note Python astropy.table unified I/O for multiple file types, including
ASCII and FITS!
- Databases as alternative to data files
Python4astronomers primer on reading and writing files
- Utility of functions:
- improve readability of code
- minimize/eliminate code repetition: more compact code and easier to change
- use same functions in multiple programs
- Syntax of functions:
- Fortran: subroutines (don't return a value, but can modify arguments) and
functions (return a value, also can modify arguments (but see INTENT)!)
FUNCTION ADD(a,b)
REAL :: a,b
ADD=A+B
END
PROGRAM MAIN
IMPLICIT NONE
REAL :: a=2, b=3, add
PRINT *, ADD(a,b)
END
- C: functions return or don't return value depending on function type (void or other)
float add(a,b) {
return(a+b);
}
#include <stdio.h>
int main() {
float a=2.,b=3.;
printf("%f\n",add(a,b));
}
- Python: function return values if return statement is given, and can return more than one!
Positional and keyword arguments, default argument values.
def add(a,b) :
return(a+b)
add(2,3)
def div(a=1,b=1) :
return(a/b)
div(2,3)
div(3,2)
div()
div(a=3)
div(b=3)
Another common default might be something like:
def func(arg=None) :
if arg is None :
{do something}
else :
{do something else}
- IDL: procedures (don't return a value) and functions (return a value).
Note that IDL procedures and functions can't be entered interactively: need
to enter in program, the compile program (.com nam), then execute
using program name: procedure take arguments after commas, function arguments
in parentheses. Optional/keyword arguments are allowed, but arguments are
either positional OR by keyword.
function add(a,b)
return,a+b
end
add(2,3)
- global and local variables
- local variables: variables defined inside of functions are only visible inside the function
- variables outside of a function may be visible automatically inside the function,
depending on language, or can be declared as global variables (FORTRAN modules or (old) common
blocks), C global variables (and extern), Python global statement inside functions, IDL common
- are argument variables changed inside of procedures/functions?
- Passing by value vs passing by reference vs pass by object reference
- Passing by value essentially means that the subroutine creates a new
variable that gets assigned the value of variable that was passed; the value
of the variable in the main program can't get changed
- Passing by reference means that the subroutine gets passed the address
of the variable in the main routine, i.e. a pointer to the main routine
variable. In this case, if the value that is pointed to is changed in the
routine, it will be changed in the calling program!
- Passing by object reference means that the subroutine gets passes a copy
of the address of the variable in the main routine. If this address gets
modified (i.e. the pointer is not modified), the value will be modified in
the main routine, but if the copy of the pointer get assigned a new value, then any
subsequent modifications to the value at the new address will not lead
to any changes at the original address.
- In simplest forms, C passes by value (but you can explicitly pass an
address which is seen in the function as a pointer), Fortran passes by reference,
Python by object reference (passes copy of pointer), and IDL by reference (but
not for array or structure elements!)
- Using libraries in Fortran, C, Python, IDL.
- In many cases, you may have functions that you wish to use with multiple
main programs. In this case, rather than including the same code in multiple
program files, you want to keep the code in separate ``utility" files, where these
files contain only functions/subroutines, but not main programs.
- Fortran and C: linking multiple object files. If the utility routines themselves
compose of many files, multiple object files can be stored together in an object archive
(.a files, using the ar command). If these files are centrally located they can be linked
using the -lname option on the link command, in which case the linker will
look for a file name libname in standard locations, which can be modified using
the LD_LIBRARY_PATH environment variable. Linking with object archives includes the
executable code from the archive (at least, the routines that are used by the program);
this duplication of executable code can be avoided using shareable executables (but this
is beyond where most users go).
- Python: modules. from and import statements. Location of files:
system installation and PYTHONPATH environment variable.
- import name
- import name.subdir
- import name.subdir as subdir
- from name import subdir
- from name import *
- PYTHON suggestion: collect all of your Python routines under a python directory and include this
in your PYTHONPATH, so you can find all of your Python routines in one place (no need for ``I know I did
this somewhere, but can't find where ...")
- IDL: file naming (procedures should live individual in files named the same
as the procedure name to be found) and IDLPATH for setting where IDL looks for procedures.
Compiling vs. running
- Examples of some libraries:
- Fortran:
LAPACK (linear algebra),
PGPLOT (plotting routines)
- C:
LAPACK (linear algebra),
PGPLOT (plotting routines)
- Python:
standard libraries, including
os (operating system routines),
add-on libraries, e.g.,
astropy (astronomy-related routines, including I/O,
numpy (numerical, including arrays),
scipy (wide range of scientific/numerical techniques,
many others
- IDL: the Astronomy Users library,
Markwardt library (esp, curve fitting),
Buie library
- traditional programs tend to think of variables and functions that
act on variables. Object oriented program tends to think of objects
that can have associated attributes, but also actions that may depend
on the attributes
- Compare dictionary (or structured array) with a class
- objects: attributes and methods.
- Python:
- error handling: good code will trap errors and report the source
rather than rely on the error message when the code crashes. Note
Python try/except statements.
- testing code: find a case where you know the solution and make sure your
code gets the right answer! Think of challenging cases that you can test,
or at least test for behavior in the expected direction.
IDL: IDLDE
Python: pdb
C: ECLIPSE
- style, e.g. PEP8
guidelines, e.g., white space, indentation rules, ...
- Make your program readable. Consider white space judiciously,
recognizing tradeoff between separating program blocks and increasing
length of code. Modular code is generally easier to read and digest.
- comments: every program must be commmented! Include an overview
comment at the top and clarification comments throughout the code as needed.
Don't need to comment what is apparent from reading the code if the code
is straightforward. Separate comment statements are usually preferable to
``end-of-line" comments.
- documentation and documentation tools: Sphinx, doxygen
(examples)
- Avoid hard-wired directories: consider use of environment variables
to specify root directories
- Avoid hard-wired constants, file lengths
- version control and package management: tagging in version control
software; modules:
handles environment variable setup, dependencies, etc. Example...
- Command line arguments
- Signal trapping
- sockets
- Parallel programming
- cross-language programming
- Consider isochrone reading code examples (isochrones.py)
- Example of astropy FITS I/O.
Next: Plotting
Up: AY575 class notes
Previous: Presentation/communication
Jon Holtzman
2015-12-11