MonetDB Solutions and CWI are jointly working on a data processing pipeline for the Square Kilometre Array (SKA) project.
The SKA is an international effort to build the world’s largest radio telescope, led by the SKA Organisation from the Jodrell Bank Observatory in the UK. The SKA will conduct transformational science to improve our understanding of the Universe and the laws of fundamental physics, monitoring the sky in unprecedented detail and mapping it hundreds of times faster than any current facility.
Initiated by the SKA Organisation and Amazon Web Services (AWS), the AstroCompute in the Cloud program is designed to accelerate tools and technique development for storage and analysis of the vast data volumes produced by modern telescopes. The team has received a grant from the SKAO-AWS to develop a Continuous Integration (CI) process for testing astronomical data processing workflows, to ensure that the system can sustainably handle growth in terms of both query complexity and data volume. In addition, it can monitor that software tools and algorithm updates work as expected, both in terms of new features and performance improvements.
A critical component of a data processing pipeline is the database management system (DBMS). DBMSs are very efficient in both storing data, as well retrieving data quickly. Modern DBMSs also use common interfaces to make data access ubiquitous. This makes them great for many forms of data retrieval, including data mining, running machine learning algorithms for clustering or classification, as well as visualisation of the stored data. For the SKA AstroCompute project our team uses MonetDB, leveraging its in-line R capabilities for zero-copy data analysis and Data Vaults for no-hassle handling of external data files. The complete pipeline will run in the AWS cloud, scaling-up on incoming data or after software/algorithm updates, and scaling back down after the processing and validation is done.