Isis:tutorial:slang

From Remeis-Wiki
Revision as of 13:55, 17 October 2019 by Koenig (talk | contribs) (→‎Regular Expressions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Programming in S-Lang

Remark: This brief introduction into S-Lang is primarily a translation of the German-language introduction to S-Lang used in the Remeis astronomy lab, which was mainly written by Manfred Hanke.

Introduction

The underlying engine of Isis is S-Lang, an interpreted language that is similar to other modern scripting languages such as Perl or python. All of these languages are "Algol-like", therefore, if you know how to program in C or any other of these scripting languages, you should not have a problem to program in S-Lang as well.

The big advantage of having a scripting language as part of a data analysis package is that many things that are "routine" work can be automated, increasing your efficiency. This includes things like loading the data set that you're working with, e.g., in the case you are working with many spectra from different instruments, and need to do some specific ignoring and rebinning, or the calculation of errors. It also allows you to access all internal structures used in doing your best fit, such that you can prepare very nice figures our output your best fit parameters in a way that is better suited to publication than the standard isis routines. Historically, many astronomers (yours truly included) did this last step in IDL. While that language is very nice, it is also very expensive, with educational licenses costing around 1000 EUR per year. It thus makes a lot of sense to move away from this and use a cheaper and more integrated approach to data analysis.

A comment to future data analysts: Scripting is very good, however, do not try to script everything. Many points of data analysis have to do with understanding your data set and here it is often much better to play with it by hand than to automatize things. Get a "feel" for your data first before trusting the computer to do everything right...

A comment to the language-warriors: Often people will ask why S-Lang was chosen as the interface and not, e.g., python. The reason is simple: because it was there. The important thing is that a scripting language is there at all. The main difficulty in learning how to program is not the programming syntax - if you think so, then you are not a good programmer - but rather to think in an algorithmic way. And this type of thinking is difficult to learn. Learning a new syntax isn't. The author of these lines (not M. Hanke ;-) ) started his life with a simple form of Amstrad Basic, followed by Omikron Basic, PASCAL, Turbo Pascal, Fortran-77(yes, it really is spelled "Fortran", not "FORTRAN". The only FORTRAN in existence was FORTRAN 66, since the 1977 standard, that language was spelled "Fortran"...), Fortran-90, IDL, C, C++, Perl, javascript, and I am sure some more languages that I have forgotten (plus all of the assembly languages that were useful when one was still programming in assembly, i.e., 80x86, 68xxxx, and so on). Historically, all of these languages have a syntax that goes back to Algol in the 1960s, and thus in the core they are all the same. For this reason, do not worry about having to learn yet another scripting language, it's just a little bit of syntax. And, if you don't know how to program, start now. Because of the languages are all the same, it does not matter that S-Lang might be seen as obscure by some people, once you know how to think algorithmically, switching over to another language won't cost you too much time. This also means that if you are applying to jobs and somebody claims that you must know java or any other language, stay away from these jobs - knowing how to program is what makes you interesting, not the specific language...

In contrast to compiled languages such as C, C++ or Fortran, scripting languages such as IDL, Perl, python, have the advantage that one can also work with them interactively and thus write small "programs" directly on the command line. We are using this feature all the time when doing data analysis by hand.

In the following we assume that you had at least some previous exposure to programming, and just give a list of the most important language structures.

S-Lang Language elements

S-Lang consists of the following language elements that allow you to structure your programs. Note that in S-Lang programs all statements must be ending with a semicolon.

Variable Declarations and Assignments

In S-Lang programs, variables must be declared (this is optional on the command line). This is done with the instruction

variable var_1, var_2, ... ;

you then assign values to a variable with

var_1=value;

where value is a valid S-Lang statement. It is possible to combine the variable declaration and assignment, e.g.,

variable a=2;

or more complicated expressions such as

var_1 = sin(a)+sqrt(25.);

Variable names may consist of any combination of the standard ASCII characters a-zA-Z0-9 as well as the underscore _ and the dollar sign $. A variable is not allowed to start with a number.

Data Types

Simple Data Types

S-Lang variables are generally weakly typed, that is the type of a variable is defined by the type of whatever is assigned to it. For example

variable a=2;

means that after the assignment a is an integer. While

variable a=2.;

means that a is a floating point number. Strings are assigned with

variable a="abcd";

However, note that a variable can easily change its type, because the weak typing will mean that after the execution of

variable a="abcd"; % String_Type
a=2.3;

a will have the type Double_Type. You can check this by printing the type of the variable:

typeof(a);

Exercise 1:

Assign the result of typeof(a) to some other variable. What is the datatype of that other variable?

An aside on integer and floating point arithmetic

Note that while weak typing usually speeds up code development, it does not preserve you from the pitfalls that go hand in hand with integer and floating point arithmetic. Consider the following classical example:

variable a=5;
variable b=10;
variable c=a/b;
print(c);

Note that c is 0 because of the rules of integer arithmetic (everything after the "." is cut away). The correct result is obtained when doing floating point arithmetic:

variable a=5.;
variable b=10.;
variable c=a/b;
print(c);

Even worse is the following often encountered example:

variable a=1000;
variable b=6500;
variable c=a*b;

and because of the rules of integer arithmetic you will have an integer overflow and c might even be negative.

The rule in arithmetic expressions is that the "strongest" data type wins, i.e., in

variable a=10000000.0;
variable b=65000000;
variable c=a*b;

c will have the correct data type since the multiplication is performed in double precision.

If you need to be 100 percent sure that a calculation needs to be done in a certain data type and you have no control that the variables entering an expression have that type (this is, e.g., the case in functions that are called by somebody else), you can force S-Lang to convert ("typecast") a variable to a certain type:

a=double(a);
b=int(b);
c=string(c);


Arrays and Lists

You can combine the above simple data types into more complicated ones. The most important of these are

Arrays

Arrays are ordered lists of things of the same data type and are declared using brackets:

variable arr=[1,2,3];

Content of arrays is accessed by giving the index in brackets:

variable c=arr[1];

Note that arrays are zero based, i.e., the above returns 2; It is possible to access more than one element at the same time by using an array as the argument of the brackets:

variable c=arr[[0,1]];

which produces an array containing two elements. If you want larger parts of an array, there is a very powerful "slicing" syntax that makes use of the fact that [a:b] defines the array [a,a+1,a+2,..,b] (for b>a and a,b Integers):

variable b=arr[[0:1]];

(which is a somewhat silly example...).

Arrays can be multi-dimensional, but the definition is not as nice as in other scripting languages:

variable arr=Array_Type[2,3];
arr[0,[0:2]]=[1,2,3];
arr[1,[0:2]]=[5,4,3];

Note that also arrays with floating values can be created by the very similar syntax: [a:b:c] creates an array with values [a, a+c, a+2*c,...], such that the last value is still lower than b. Even more comfortable is the syntax [a:b:#n], which creates exactly an array of length n, with equally spaced values ranging from a to b.

Lists

Lists are ordered lists of things that can be of different data type. They are declared using curly brackets:

variable lis={1,2,3};

Accessing the list elements uses the standard bracket syntax:

variable a=lis[1];

Lists are important whenever you want to store different things in one variable. For example, the following is legal:

variable lis2={1,["a","b","c"],3.2};


Operators

Binary operators combine two expressions, x and y, where x and y are constants, variables, functions and so on. The most important operators are:

  • arithmetic operators:
    • +, -, *, /: basic arithmetic operators, the usual priority rules apply,
    • ^: exponentiation (2^3 is two to the power of three),
    • mod: modulo operation
  • string concatenation is done with the + operator.
  • comparison : is done with <, <=, ==, >=, and >. Note that like in all programming languages, you should never test two floating point variables for equality, this will most often not work in the way you expect...

All of these operators can be used not only on scalar values but also on arrays. They are then used on an element basis. The resulting code is very fast. For example, to add two arrays:

variable a=[1,2,3];
variable b=[6,5,3];
variable c=a+b;

As an aside, one often wants to add/subtract something from a variable. S-Lang allows the following C-like shortcuts:

a+=5;

is equivalent to

a=a+5;

and similar -=, *=, and /= (I don't think I've ever used the last one, though...).


Program flow control

Conditional execution

Conditional execution is done with the if-statement:

if ( condition ) {
   true-code;
} else {
   false-code;
}

For example:

variable a=+1;
variable b;
if ( a<0 ) {
  b=-1;
} else {
  b=+1;
}

Note that the else-branch is optional.

Loops

for-loop

The syntax of the for loop is

for( initialize ; condition ; increment ) {
   code ;
}

where usually in initialize a loop control variable is, well, initialized, and then incremented as long as condition is valid. An example would be

variable i;
variable npt=10;
for (i=0; i<npt; i++) {
    print (i);
}

which counts from 0 to 9 (a count down is also possible, use i--). Obviously, more than one line of code is possible...

Note: even though syntactically possible, never ever use anything else than an integer variable as the loop counter, unless explicitly necessary.

while loop

The while loop is done while a condition is met:

while ( condition ) {
  code ;
}

Note that if condition is not met when the while loop is hit first, code is not executed at all.

The above counting example can be implemented as follows:

variable i=0;
variable npt=10;
while ( i<npt ) {
  print(i);
  i++;
}

do...while loop

The do...while-loop is a loop where the body of the loop is executed at least once, since the condition is only tested at the end of the first passage through the loop:

do {
  code ;
} while (condition);

Functions

Functions are subroutines which execute a sequence of instructions whenever they are called (i.e., whenever their name appears in a program). Functions can, but do not have to, have arguments, i.e., variables that control the behavior of the routine.

Intrinsic functions

Note 1: More information about individual functions can be obtained with isis' help-function.

Note 2: Most simple functions also work on arrays.

Mathematical functions

  • sign functions: abs, sign, _diff, _max, _min
  • rounding functions: ceil, floor, nint, round
  • basic algebraic functions: sqr (square!), sqrt (square-root), hypot, polynom, mul2
  • exponential and logarithm: exp, expm1, log, log10, log1p
  • trigonometric functions (argument is in radian!): sin, cos, tan, asin, acos, atan, atan2
  • hyperbolic functions: sinh, cosh, tanh, asinh, acosh, atanh
  • complex numbers: Real, Imag, Conj
  • tests: isinf, isnan (nan: not a number), _ispos, _isneg, _isnoneg

Array functions

  • number of elements in an array: length
  • extrema: max, min, maxabs, minabs
  • summing array elements: sum, sumsq, cumsum
  • tests: all, any
  • get the indices for all or some elements for which a condition is met: where, wherenot, wherefirst, wherelast

Regular Expressions

Regular expressions are extremely powerful to format, edit, and filter strings. The SLANG functions string_match and derivatives take care of it.

  • The whitespace regex \s must be replaced by an actual whitespace
  • The word regex \w must be replaced by [A-Za-z0-9_]
  • To extract individual sub-strings one can group the characters by \(\) (Note that the group parenthesis is escaped). This group can be accessed in the returned String array of string_matches. Note that the zeroth entry contains the full string.
  • Important: The regex string has to be followed by an R

An are some examples:

% Match only the result 17.56
variable matches = string_matches("The result is: 17.56", "[A-Za-z0-9_ ]+: \(\d+\.\d+\)"R);
% Match the result 17.56 but also 17
matches = string_matches("The result is: 17.56", "[A-Za-z0-9_ ]+: \(\d+\.?\d*\)"R);
variable result = matches[1];
% Match _exactly_ two digits, also matches if no decimal
string_matches("dec=-02 57 75.3", "dec=\(-?\d\{2\} \d\{2\} \d\{2\}\.?\d*\)"R);
% Something more complex
variable teststr = "1  |      0.00|4U 1850-03";
variable regex  = "\(\d\) +| +\(\d+.\d+\)|\(.+ .+\)"R;
variable matches = string_matches(teststr, regex);
variable sourcename = matches[3]; % Note that matches[0] contains teststr

Printing

Output is done with the print and the vmessage functions. vmessage uses a format similar to the C printf-function to format the output. Examples include:

variable a=1.2347;
vmessage("%f",a);   % print with full precision, note the roundoff error!

or

variable a=25;
vmessage("%05d",a); % print 5 digits, zero padded

Note: vmessage works very similar to the function printf, which exists in many programming languages (and also in S-Lang).

user-defined functions

your own function can be defined with the syntax

define functionname (arguments) {
   code;
   :
   code;
   return value;
}

where the return value statement is optional.

Functions are very useful to structure your program. Use them liberally! An example would be that for a given data set, you write a function to load the data and do the rebinning. Additionally, giving useful names to your functions improves the readability of your code.

A more silly example is the following, which returns the sum and difference of two numbers:

define adddiff(a,b) {
   % return the sum and difference of two numbers.
   variable sum=a+b;
   variable diff=a-b;
   return [sum,diff];
}

Note the comments. It is good style to comment your code well in order to allow you and others later to understand what the code is doing. You should always comment your code while writing, do not only do it at the end because somebody told you so, make the writing of comments part of your coding practice!

Exercise 2

Write a slang function 'midnight' which returns the roots of a quadratic equation [math]\displaystyle{ ax^2+bx+c }[/math]. The routine should work for all possible values of a, b, and c.

libraries

Libraries are collections of S-Lang functions. They get loaded with the statement

require("libraryname");

Afterwards, all functions in "libraryname" are available.

Isis programs

Isis programs consist of a sequence of function declarations and a main program, stored in a file that can be written with any editor of your choice. To execute a program you have several choices. In the following, let's assume the program's filename is test.sl:

 - you can execute the program from the Linux command line, by issuing the command isis test.sl;
 - if you want to execute the program from within isis (e.g., because you want to work on its output interactively, use ()=evalfile("test.sl");.
 - to run a program under isis and immediately exit isis, use the "shebang" notation. For example:
#!/usr/bin/env isis

% stupid count down example
variable i;
variable npt=10;
for (i=0; i<npt; i++) {
    print (i);
}

then make the code executable under Linux:

hde226868:~/> chmod ugo+x ./test.sl

After this you can execute the code with

hde226868:~/> ./test.sl

The name "shebang"-notation comes from the pronounciation of the hash-sign '#' as "she" and the exclamation mark as "bang". yes, really.

Exercise 3

Write a S-Lang program that loads the gratings data from Exercise 3 of Advanced Fitting Techniques, 1. The program should have functions that

  1. load the data, ignore the appropriate energy channels and rebin it. The function should return the indices of the PCA, HEXTE A, and HEXTE B data.
  2. setup the fit function and set the parameters to reasonable starting values
  3. call the above functions from a main program and perform the fit
  4. call a third function that makes a plot of the best-fit with residuals. Use the same color for the HEXTE A and B data points. Hint: note that the call to the plot functions is a list. The colors assigned to the data points through the dcol qualifiers apply to the individual list elements. For example, if dcol=[1,2] then the spectra corresponding to the 2nd list element are plotted in color number 2. The list describing the spectra is a list, i.e., it can contain arrays as list elements... In other words: call the plot functions such that all color qualifiers have only two elements.
  5. call a fourth function which calculates a 2D-error contour for [math]\displaystyle{ N_H }[/math] and [math]\displaystyle{ \Gamma }[/math] (NOTE: the relevant information for this last point is not yet there and, because of carpal tunnel syndrome, will only be available on Tuesday).