[submitted]
Kliko - The Scientific Compute Container Format
We present Kliko, a Docker based container specification for running one or multiple related compute jobs. The key concepts of Kliko is the encapsulation of data processing software into a container and the formalisation of the input, output and task parameters. Formalisation is realised by bundling a container with a Kliko file, which describes the IO and task parameters. This Kliko container can then be opened and run by a Kliko runner. The Kliko runner will parse the Kliko definition and gather the values for these parameters, for example by requesting user input or pre defined values in a script. Parameters can be various primitive types, for example float, int or the path to a file. This paper will also discuss the implementation of a support library named Kliko which can be used to create Kliko containers, parse Kliko definitions, chain Kliko containers in workflows using, for example, Luigi a workflow manager. The Kliko library can be used inside the container interact with the Kliko runner. Finally this paper will discuss two reference implementations based on Kliko: RODRIGUES, a web based Kliko container schedular and output visualiser specifically for astronomical data, and VerMeerKAT, a multi container workflow data reduction pipeline which is being used as a prototype pipeline for the commisioning of the MeerKAT radio telescope.
[ascl:2002.010]
Apercal: Pipeline for the Westerbork Synthesis Radio Telescope Apertif upgrade
Apercal is a dedicated, automated data reduction and analysis pipeline written for the Apertif (APERture Tile In Focus) upgrade to the Westerbork Synthesis Radio Telescope. This upgrade dramatically increases the field of view and survey speed of the telescope and is being used for survey observations that can produce 5 terabytes of data for each observation. Apercal uses existing and new tools and parallelization to provide the performance needed for the large volume of data produced Apertif surveys. The software is written entirely in Python and uses third–party astronomical software, such as AOFlagger (
ascl:1010.017), CASA (
ascl:1107.013), and Miriad (
ascl:1106.007), for certain tasks. Apercal is modular, making it possible to run specific modules manually instead of the full pipeline, and information can be exchanged between modules because status parameters are written and read from a python pickled dictionary file. The pipeline can also run fully automatically.