Monitoring HPC cluster a IT infrastruktury v IT4Innovations

Abstract

The aim of this work is implementation of new monitoring systems and consolidation with existing ones already deployed at IT4Innovations (National supercomputing center IT4Innovations) to deliver centralized HPC clusters and infrastructure monitoring solutions. The Icinga2 monitoring tool is used for implementation of centralised monitoring. The whole solution is deployed using the configuration tools Puppet and Ansible, based on the location of monitoring servers. A three-tier, clustered monitoring system has been created. The components of the monitoring system fulfil requirements for high availability and load-balancing. Monitoring servers are cordoned into zones based on aimed clusters or infrastructure. Centrally accessed web frontend is available for system administrators. The monitoring solution is deployed in a fully automated manner, using configuration tools, so the possibility of fast delivery into the production environment with minimal need of manual work is ensured.

Description

Subject(s)

Ansible, availability, cluster, distributed monitoring, GIT, HA, high availability, HPC, Icinga, Icinga2, IT4Innovations, load-balancing, monitoring probes, monitoring, Puppet, supercomputer

Citation