Fermi-LAT

From Remeis-Wiki
Jump to navigation Jump to search

The Fermi Large Area Telescope (LAT) observes the gamma-ray sky from 20 MeV up to 1 TeV. It does so by scanning the entire sky three times a day, hence, there is gamma-ray data available for each day for every position on the sky. For more information on the spacecraft, please check out the official Fermi webpage at https://fermi.gsfc.nasa.gov.
Before you use the script, make sure you know what you're doing. If you are analysing gamma-ray data for the first time, please read the online documentation at https://fermi.gsfc.nasa.gov/ssc/data/analysis/LAT_essentials.html. Because the Fermi-LAT data can be accessed by everyone for free, there is an extensive documentation available online.
If you have comments or questions, please find the information on the Fermi contact person at the Remeis observatory at the bottom of the page.


Installation & preparation

To analyse data from Fermi-LAT, the Fermitools are needed. Additionally, there is a python package, called fermipy, such that one can do the data analysis from within a python environment. There are analysis scripts, written with fermipy, available to create either a spectrum or a light curve. Those are globally available at $FERMITOOLS, which links to /software/Science/satscripts/fermiscripts.
To run those scripts, it is necessary to have both the Fermitools and fermipy installed, ideally in a conda environment. There are two options for the user to do so: calling a currently existing conda environment, or creating the corresponding conda environment yourself. The first version is, at the current moment, the recommended choice, also because the pipeline scripts are written for python 2.7, while for the latter one needs to make sure the correct versions of the Fermitools and fermipy are installed.

Option 1 - Using the existing conda environment (Fermitools 1.2.23 & fermipy version 0.20.0)

The pipeline scripts for a standard Fermi-LAT analysis (spectrum or light curve) are optimized to run with the version of this conda environment. In order to envoke it, it is necessary to source the conda installation, where the environment for a Fermi-LAT analysis has been created, before activating the environment. In order to simplify this, you can copy the following in your .cshrc file.

#!/bin/tcsh
source /userdata/data/gokus/conda/miniconda2/etc/profile.d/conda.csh
alias fermipy "conda activate fermipy" 

If you already have your own conda installation, which you regularly use, make sure that you create an alias instead of sourcing that conda installtion by default! In case you're using a bash shell, the addition in your .bashrc file should look something like this:

#!/bin/bash
alias fermipy='conda activate /userdata/data/gokus/conda/miniconda2/envs/fermipy'

Using the command fermipy, you can now change into the conda environment, in which the necessary tools for using the pipeline scripts are installed. Now you're ready to do some Science!

Option 2 - Installing and creating a conda environment yourself

Both the Fermitools and fermipy are managed via GitHub and the full code, documentation and installation can be found at
Fermitools: https://github.com/fermi-lat/Fermitools-conda
fermipy: https://github.com/fermiPy/fermipy (Github), https://fermipy.readthedocs.io/en/latest/ (Documentation)
Please make sure you are installing the right versions of each (as in between versions there was a change in the use of python2.7 and python3). It is recommended to read the full installation instructions provided in the documentations.

The LAT data

The data downlinked from the Fermi satellite contains information about the events/photons measured by the LAT, and information about status of the spacecraft itself at each time. Those information are needed for an analysis, as they are the main input. These files are called photon and spacecraft files. One can acquire the data needed for a specific analysis via the online data query at https://fermi.gsfc.nasa.gov/cgi-bin/ssc/LAT/LATDataQuery.cgi, or download weekly data products for the entire sky from https://fermi.gsfc.nasa.gov/ssc/data/access/.

However, all data acquired by Fermi-LAT during its mission are also stored at the servers of the observatory, hence, it is not necessary to download a specific dataset. If you are using the provided analysis scripts, it automatically looks through all the data that is available at the observatory. All photon files are linked at /home/X-ray/Fermi/archive/photon_ft1_files.list, while all of the spacecraft data is merged in /home/X-ray/Fermi/archive/spacecraft_ft2_files.fits.

In the early steps of your analysis, a region of interest (ROI) around your desired source or source coordinates will be created, which contains all the sources that are included in the 10-year Fourth Point Source Catalog (4FGL). In addition, extended sources, which are mostly present in the galactic plane and the magellanic clouds (e.g. features of the LMC, the Centaurus A lobes, supernova remnants), can be part of that ROI, too. In order to run the pipeline scripts properly, two environment variables need to be added to your .cshrc file:

#!/bin/tcsh
setenv FERMI_DIR /home/X-ray/Fermi/archive/catalogs
setenv LATEXTDIR $FERMI_DIR/4FGL_extended_sources

Alternatively for the bash shell, add the following to your .bashrc file:

#!/bin/bash
export FERMI_DIR=/home/X-ray/Fermi/archive/catalogs/
export LATEXTDIR=$FERMI_DIR/4FGL_extended_sources

In the $FERMI_DIR, you can find the current catalog as well as the templates for all extended sources.

In addition, the LAT background models, needed to model the galactic and isotropic diffuse emission, can be found at /home/X-ray/Fermi/archive/LAT_background_models. Those are provided and regularly updated by the Fermi-LAT collaboration and help to improve your analysis results, as background, especially at lower energies (< 1 GeV), can have quite an impact on your results.

Fermi-LAT analysis with the pipeline scripts

The scripts available at $FERMITOOLS are written to perform a standard analysis on a point source, i.e. a blazar. Please note that these analysis scripts might not provide correct results if you are analysing energies below 100 MeV. The scripts have only been tested for blazar analyses.

Creating a spectrum

If you are interested in the gamma-ray spectrum of a source, the corresponding script is called make_spectrum.sl and can be called via

$FERMITOOLS/make_spectrum.sl SOURCENAME TMIN TMAX

As you can see, the script needs three arguments: the name of the source, and both a start time and an end time, for which an average gamma-ray spectrum should be computed. There are several options you can choose from regarding the specifics of the analysis, which you can call via $FERMITOOLS/make_spectrum.sl --help. The full help output looks like this:

$FERMITOOLS/make_spectrum.sl --help

** General script to compute Fermi spectra **
  [Make sure you have loaded the Fermitools and Fermipy (see README file/wiki entry)]

Usage: make_spectrum.sl [options] [src] [tmin] [tmax]
Mandatory inputs:
    src:    Name of source without any blanks, either a 4FGL name or a different name if corresponding catalog is defined in options
    tmin:   in MJD, start of time range for which SED will be computed
    tmax:   in MJD, nd of time range

Options for analysis:
   --known=yes/no	Is source a known Fermi source (in the 4FGL)? Important to set correctly, otherwise
			the analysis will fail! Default: YES

   --ra=degrees		RAJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --dec=degrees	DEJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --cat=catalog.fits	Catalog to query for source coordinates, must have columns named Source_Name,
			RAJ2000, and DEJ2000 for name and coordinates. If '--ra' and '--dec' are also given,
			the coordinates	from the catalog will be ignored. Default: 4FGL

   --roi=degrees	Size of ROI in degrees. Default: 10 [deg]

   --emin=energy	Minimum energy in MeV. Default: 100 [MeV]

   --emax=energy	Maximum energy in MeV. Default: 300000 [MeV]

   --ebin=number	number of energy bins per energy decade. Default: 10

Options for setup:
   --expath=path	Path where analysis should be saved. Default: location where script is started

As you can see, it is assumed that you do the analysis on a known gamma-ray source, which is named in the 4FGL. If that is the case, it is easiest to give the 4FGL name (without blanks!) as first arguments to the script. If your source is not in the 4FGL, you need to add the coordinates via the --ra and --dec options.


Creating a light curve

The computation of a light curve is similarly called make_lightcurve.sl and is available via

$FERMITOOLS/make_lightcurve.sl SOURCENAME TMIN TMAX

The script needs the same arguments that are needed for the spectral analysis: the name of the source, and both a start time and an end time, for which an average gamma-ray spectrum should be computed. Again, several options are provided for specific needs for the analysis, which you can call via $FERMITOOLS/make_lightcurve.sl --help. The full help output looks like this:

$FERMITOOLS/make_lightcurve.sl --help

** General script to compute Fermi lightcurves **
  [Make sure you have loaded the Fermitools and Fermipy (see README file)]

Usage: make_lightcurve.sl [options] [src] [tmin] [tmax]
Mandatory inputs:
    src:    Name of source without any blanks, either a 4FGL name or a different name if corresponding catalog is defined in options
    tmin:   in MJD, start of time range for which SED will be computed
    tmax:   in MJD, end of time range

Options for analysis:
   --known=yes/no	Is source a known Fermi source (in the 4FGL)? Important to set correctly, otherwise
			the analysis will fail! Default: YES

   --ra=degrees	RAJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --dec=degrees	DEJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --cat=catalog.fits	Catalog to query for source coordinates, must have columns named Source_Name,
			RAJ2000, and DEJ2000 for name and coordinates. If '--ra' and '--dec' are also given, 
			the coordinates from the catalog will be ignored. Default: 4FGL

   --roi=degrees	Size of ROI in degrees. Default: 10 [deg]

   --emin=energy	Minimum energy in MeV. Default: 100 [MeV]

   --emax=energy	Maximum energy in MeV. Default: 300000 [MeV]

   --ebin=number    number of energy bins per energy decade. Default: 10

   --timebin=number	Length of one light curve bin. Default: 7 [days] (weekly binning)
   --savebins=True/False	Save all the light curve bins? Careful, can take up a lot of space! Default: False

Options for setup:
   --expath=path	Path where analysis should be saved. Default: location where script is started

Please note that the default time binning for the light curve is one week (7 days).

Performing Fermi-LAT analysis using slurm

The computation of LAT products can take several hours up to days (if you want to compute many bins for your light curve). Ideally you want to let the cluster perform your jobs for you and not run an analysis on your desktop machine, unless you want to test something. In order to make life easy for a job submission on slurm, you can use the script submit_fermijob.sl, which is also available at the $FERMITOOLS location. The usage of submit_fermijob.sl should look like

$FERMITOOLS/submit_fermijob.sl ANALYSIS SOURCENAME TMIN TMAX

In addition to the name and the time ranges for the analysis, it is necessary to specify whether a spectral (SED) or light curve (LC) computation is requested. Calling the help on this script via $FERMITOOLS/submit_fermijob.sl --help, gives a full overview of all options available here:

$FERMITOOLS/submit_fermijob.sl  --help

** Script to submit a Fermi Analysis to slurm **
  This script creates a file to submit a Fermi LC or SED analysis to the slurm queue
  [Make sure you submit the job when the Fermi Environment is already loaded!]


Usage: submit_fermijob.sl [options] [analysis] [src] [tmin] [tmax]
Mandatory inputs:
    analysis:	specify whether it's a light curve [LC] or a SED analysis [SED]
    src:    Name of source as string, either a 4FGL name or a different name if corresponding catalog is defined in options
    tmin:   in MJD, start of time range the analysis
    tmax:   in MJD, nd of time range

Options for analysis:
   --known = yes/no	Is source a known Fermi source (in the 4FGL)? Important to set correctly, otherwise the analysis will fail! Default: YES

   --ra=degrees		RAJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --dec=degrees	DEJ2000 position of source, only needed if source is NOT in 4FGL and no other
			alternative catalog is provided with the '--cat option'!

   --cat=catalog.fits	Catalog to query for source coordinates, must have columns named Source_Name,
			RAJ2000, and DEJ2000 for name and coordinates. If '--ra' and '--dec' are also given, the coordinates
			from the catalog will be ignored. Default: 4FGL

   --roi=degrees	Size of ROI in degrees. Default: 10 [deg]

   --emin=energy	Minimum energy in MeV. Default: 100 [MeV]

   --emax=energy	Maximum energy in MeV. Default: 300000 [MeV]

   --ebin=number	Number of energy bins per energy decade. Default: 10

   --timebin=number	Length of one light curve bin, only relevant for LC analysis! Default: 7 [days] (weekly binning)
   --savebins=True/False	Save all the light curve bins? Careful, can take up a lot of space! Default: False

Options for setup:
   --submit=yes/no	State whether you want to submit to slurm right away; Default: YES

   --walltime=hours	Walltime in hours, note that the default walltime might not be enough time for a longer lightcurve.
			If you're running into a Slurm TIMEOUT error, set a longer walltime and resubmit. Previous steps of the analysis are saved
			and don't need to be repeated unless something in the configuration setup has to be changed; Default: 10 hours

   --mem=memory		Memory per CPU, set only number in unit of Gigabyte; Default: 3GB

   --email=mailadress	If you want to get informed about the slurm status per email, put email here.

   --expath=path	Path where analysis should be saved. Default: location where script is started

While most options are the same as for the make_spectrum.sl and make_lightcurve.sl scripts, there are additional options regarding your setup on slurm. Especially the memory and walltime options need to be increased in case you are want to compute a lightcurve with more than 50 bins.

Q&A

Is multi-threading available for the pipeline scripts? - No, this is not possible to use via slurm at the moment. If you're running an analysis on your machine with your own scripts, it is possible to activate multi-threading for the computation of a light curve. For more information on this, please check out https://fermipy.readthedocs.io/en/latest/advanced/lightcurve.html#optimizing-computation-speed.

How can I submit an array job? - In order to do that, you need to work from within a bash shell, not a (t)csh shell. I recommend to write a script to create the .slurm file that you want to submit to slurm. Here's an example how this array job can look like:

#!/bin/bash
#SBATCH --job-name LATspectra
#SBATCH --output job.out-%a
#SBATCH --error err.out-%a
#SBATCH --time 48:00:00
#SBATCH --partition=remeis
#SBATCH --ntasks=1
#SBATCH --hint=nomultithread
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,FAIL,ARRAY_TASKS
#SBATCH --mail-user=my@email.de
#SBATCH --array 0-XXX%YY (XXX: number of tasks in array, YY: number of jobs running simultaneously

COMMAND[0]="/software/Science/satscripts/fermiscripts/make_spectrum.sl --expath=/your_path/ 4FGLJXXX.X-XXXX mjd_start mjd_stop"
COMMAND[1]="/software/Science/satscripts/fermiscripts/make_spectrum.sl --expath=/your_path/ 4FGLJYYY.Y-YYYY mjd_start mjd_stop"
COMMAND[2]="..."
...

Can the scripts find possibly unknown gamma-ray sources by itself? - Yes, when running the analysis script to create a spectrum (with make_spectrum.sl), there is a routine implemented (fermipy function find_sources), which searches for excesses in the TS map with a significance of 5 sigma or higher. The coordinates of this found source is then saved in the folder, in which the analysis is run, to allow further inspections and a follow-up analyis.

Troubleshooting

During the Fermi data analysis, several errors can pop up and are not necessarily linked to the Fermi analysis itself. Here, I'll show those I encountered and what I have done to (hot)fix them.

Error during deletion of LC subfolders
The internal sub-routine for creating the LCs throwed me an error a couple of times, that one of the LC bins folder was missing, and hence, the light curve could not be assembled into one fits file. Because I couldn't pinpoint the error in the source code, my work-around for this issue is implemented in the wrapper scripts make_spectrum.sl and make_lightcurve.sl. Deleting the subfolders is now handled in there and not via the routine in the source code.

Individual bins crashed and then the light curve cannot be assembled
This is a problem that I fixed manually by editing the source code in the lightcurve.py file. While I have also this issue to the developers, it has been currently not fixed yet (June 23 2021). In order to still manage creating the light curve, I implemented the following hot fix. In case you are setting up your own conda environment and you come across an error related to KeyError: fit_success, here is my hot fix (additions marked with ⇒):

# this snippet is nearly at the bottom of the file

        itimes = enumerate(zip(times[:-1], times[1:]))
          ⇒ flux_const_for_calcTS_var = None
        for i, time in itimes:

              ⇒ if not 'fit_success' in mapo[i]:
                  ⇒ self.logger.error(
                  ⇒ 'fit_success not found in bin %d in range %i %i.' % (i, time[0], time[1]))
                  ⇒ continue

              ⇒ [el]if not mapo[i]['fit_success']:
                self.logger.error(
                    'Fit failed in bin %d in range %i %i.' % (i, time[0], time[1]))
                continue

              ⇒ if 'flux_const' in mapo[i]:
                  ⇒ flux_const_for_calcTS_var = mapo[i]['flux_const']

            for k in o.keys():

                if k == 'config':
                    continue
                if not k in mapo[i]:
                    continue
                # if (isinstance(o[k], np.ndarray) and
                #    o[k][i].shape != mapo[i][k].shape):
                #    gta.logger.warning('Incompatible shape for column %s', k)
                #    continue

                try:
                    o[k][i] = mapo[i][k]
                except:
                    pass

        systematic = kwargs.get('systematic', 0.02)

        o['ts_var'] = calcTS_var(loglike=o['loglike_fixed'],
                                 loglike_const=o['loglike_const'],
                                 flux_err=o['flux_err_fixed'],
                                   ⇒ flux_const=flux_const_for_calcTS_var,
                                 systematic=systematic,
                                 fit_success=o['fit_success_fixed'])

        return o


Using /karl
While /karl has the benefit of storing large amounts of data, there is an issue with computing many Fermi spectra or light curve at the same time and storing that on /karl. The problems that arise because of this are not clearly visible from checking the log files. An error which can relate to that, looks, for example, like this:

GTBinnedAnalysis.run_gtapp(): Caught N3tip12TipExceptionE at the top level: setNumRecords could not insert rows in FITS table in extension "EVENTS" in file "/path_to_analysis/LC_analysis/ft1_00.fits" (CFITSIO ERROR 108: error reading from FITS file)

In case you need to compute a lot of spectra of light curves and want to do this on /karl, either limit the number of jobs running simultaneously to 10, or store the data on /userdata, where I have so far never encountered this problem.


Running the original script to check what's going wrong
If you encounter a new error, you can also go ahead and start the python script for the analysis directly in the shell. Remember to load your conda environment first. You also need a config_SED.yaml or config_LC.yaml file in the folder where you are going to run the analysis, otherwise you won't get far. You can directly run the script, e.g. via

(fermipy)[user@computer]> $FERMITOOLS/SEDcreator_fermipy.py


Light curve fails too many open files
Computing a long light curve with only a short binning (e.g. daily binning for a year-long light curve, or weekly binning for a light curve covering a few years), can result in a termination of the analysis process when the computer cannot handle the amount of open files anymore. I'm not entirely sure what goes wrong internally, however, my suggested hot fix for you is to split the computation of the light curve into manageable time intervals (e.g. chunks of ~100 bins). After all chunks have been computed, you can put them together using the command ftmerge:

#!/bin/tcsh

# Step 1: Create list of all light curve chunk files
ls -1 /path_to_my_analysis_results/source_name/LC_analysis/*_lightcurve.fits > /path_to_my_analysis_results/source_name.list

# /path_to_my_analysis_results/ should contain the folders of each chunk, e.g. named MJD_start_end

# Step 2: Merge all light curve files into one fits file
ftmerge @/path_to_my_analysis_results/source_name.list /path_to_my_analysis_results/lightcurve_source_name.fits clobber = yes

No worries about losing already computed bins in case your analysis crashes: those are not deleted when the script crashes mid-way, therefore already computed LC bins are available after the crash and it's not necessary to re-do the analysis for all of the light curve bins.

Contact person

The scripts and this wiki entry has been written by Andrea Gokus. She is also currently the maintainer of the LAT dataset on the Remeis server. In case of any further questions or problems, please contact her via andrea.gokus[at]fau.de.
Last status update on this wiki page: September 27, 2021