EXAMON

Description

EXAMON is an open-source monitoring framework deployed at CINECA and designed by UNIBO. It is composed of three main layers, the Data Collection, the Communication, the Storage and the application layer. The Data Collection layer samples two kinds of data i) the physical data measured with sensors and ii) workload information obtained from the job dispatcher. These software components are composed of two main objects, the Message Queue Telemetry Transport (MQTT) API and the Sensor API object. The Communication Layer is built around the MQTT protocol. The storage layer is based on a distributed and scalable time-series database (KairosDB) that is built on top of a NoSQL database (Apache Cassandra) as back-end. A specific MQTT subscriber (MQTT2Kairos) is implemented to provide a bridge between the MQTT protocol and the KairosDB data insertion mechanism. The Application Layer, takes care of the data gathered by the monitoring framework which can serve multiple purposes. For example, ML techniques can be applied to extract power/thermal predictive models or devise online fault detection.

Integration

EXAMON can be easily integrated with any MQTT data source as well as can serve for conveying node management configurations. Within REGALE the EXAMON protocol will be extended for conveying node management and job management configuration settings as well as for job management and run-time monitoring values. Additional interfaces will be designed to facilitate the data gathering and parallel processing of the monitored data.

Sophistication

With the planned interface extensions, EXAMON can interface with the workload manager and job manager allowing to collect lively their progress and changing their behaviour to adapt to system level decisions.

Please visit http://projects.eees.dei.unibo.it/monitoring/wordpress/ for more information on EXAMON.