Description
DCDB is a scalable monitoring framework for the acquisition of both in-band and out-of-band sensor data in HPC systems. DCDB follows a modular software architecture, allowing for the implementation of different plugins supporting a variety of data sources and protocols, and is currently deployed on LRZ’s facilities. DCDB has been further extended with Wintermute, a data analytics framework, supporting online and on-demand data analysis.
Integration
DCDB can easily be extended with plugins supporting different protocols and data sources for the acquisition of sensor data (e.g., IPMI, perf events, SNMP, etc.). DCDB further provides several interfaces for making the collected data available to users/sysadmins and external tools potentially exposing such information to schedulers, resource managers, runtime and profiling tools. Following the same software design principles, Wintermute further enables the development and integration of online employment of different data analytics models.
Sophistication
DCDB will support concise and exhaustive characterization of HPC system components and applications via “signatures”, loosely representing a snapshot of HPC and code status. Furthermore, DCDB will be extended with a closed-loop control, exposing reactive changes for resource allocation optimization. Such control loop may leverage the above-mentioned signatures for supporting decision making.
Please visit https://www.lrz.de/presse/ereignisse/2019-11-30-Monitoring-for-HPC/ for more information on DCDB.