ISIS - Defining data dependent models

From Remeis-Wiki
Revision as of 17:23, 29 May 2019 by Stierhof (talk | contribs) (Created page with "ISIS provides a variety of ways to fit arbitrary complicated models to your data. One of the most straightforward approaches for 'complicated' models is tying parameters toget...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

ISIS provides a variety of ways to fit arbitrary complicated models to your data. One of the most straightforward approaches for 'complicated' models is tying parameters together either by the tie-to field (which is not very robust) or by assigning the parameter value to a slang call like:

#=> 0.03*constant(1).value

Here we will discover a different way to define a fit function which is mostly the same for multiple data sets but not for all

Multiple similar data sets

Consider the following situation. From a source there are multiple observations available for different observation times. The observations are performed by different instruments so you expect some deviations between the data sets although they were taken at the same time. But you are sure that the difference in the data is only a normalization factor so the simultaneous taken data should have the same shape.

Lets further assume that all the spectra are sufficiently described by a power law plus additional emission line profiles. Taking the approach from above, i.e., tying parameters together one would define the fit function as

fit_fun("powerlaw(Isis_Active_Dataset)+gauss(Isis_Active_Dataset)");

Isis_Active_Dataset is a global variable defined by ISIS and corresponds to the currently handled data set id. This works as the function is called for every data set individually. Hence, each data set has a function assigned with a different function id corresponding to the data set id.

Lets say you have from three different observation times two observations from different instruments. In this case you would assume that (except normalization) the power law parameters are the same for each of the observation pairs as well as the line parameters. But the parameter may differ between different observation times. To change the model accordingly you would add the normlaization

fit_fun("constant(Isis_Active_Dataset)*(powerlaw(Isis_Active_Dataset)+gauss(Isis_Active_Dataset))");

for each data set and tie all the parameters that should be equal together. For this moderately simple example this would not be a tremendous problem and fitting could be done as usual. The problem gets more cumbersome if we add more observations from different observation times. In other words, this does not scale very well.

Reducing the number of available parameters

With each additional data set we add to your initial collection your model gets an additional number of parameters which might be completely worthless if they are only tied to already existing once. Also, with more and more parameters that have to be tied another your function is increasingly error-prone.

In the next step we will try to reduce the number of available parameters by reducing the number of functions that are used. For this we exploit the fact that the string that is passed to the fit_fun call is not much more than an elaborate slang eval call. To illustrate this just type in

fit_fun("constant(1, print(\\"Hello there, I'm a message from the guts of ISIS!\\")");

and you will be greeted by the just typed in message. Why is that? Well, because to setup the parameters the string is evaluated for each data set once[1]. But his also means we can use anything that is a valid slang object inside this string and it will, well, work. Just printing out things is not necessarily the best thing we can do here. To get back to the original problem we just have to find a way to replace Isis_Active_Dataset by something which gives us a number that corresponds to the different data set collections which were taken at the same time.

A very simple way of doing this is by appending to each data set an identifier, i.e., the time at which the data was recorded to the meta data. This can be done by

set_dataset_metadata(id, timestamp);

(see [[]]) where id is the data set id and timestamp the time the data was taken. Now we can define a function like this one

variable TIME_REF = { [time1_lo, time1_hi], [time2_lo, time2_hi], [time3_lo, time3_hi] };

public define Active_Set () {
    variable md = get_dataset_metadata(Isis_Active_Dataset);
    variable i;
    _for i (0, length(TIME_REF)-1, 1) {
        if (TIME_REF[i][0]<= md && TIME_REF[i][1] >= md)
            return i + 1; % add 1 so we have no function with "0"
    }
}

Note that the function has to be declared as public since the fit_fun string evaluation happens in a separated namespace. Further note that we used the variable TIME_REF to look up a unique number corresponding to each time bin.

With this new function we can set our fit function to

fit_fun("constant(Isis_Active_Dataset)*(powerlaw(Active_Set)+gauss(Active_Set))");

In this way it is not required to tie any parameters since the only one left are the minimum required. Also, we left the normalization for each data set free as they might differ for all data sets.



  1. We also use here that the fit functions do not have a fixed number of parameters which is why we can just append the print call