EAR

Description

EAR is an open-source management framework optimizing the energy and efficiency of a cluster of interconnected nodes. To improve the energy of the cluster, EAR provides energy control, accounting, monitoring and optimization of both the running applications and the cluster. EAR is robust and reliable and has been in production on Supremacy-NG (LRZ) since 2019. At EAR’s core there are two components: EAR Daemon (EARD), and EAR runtime (EARL). EARL is a dynamic, transparent, and lightweight runtime library that optimizes and controls the energy consumed by MPI jobs without any application modification or user input. EARL dynamically identifies repetitive regions in parallel applications. The application information collected by EARL reports basic performance and power metrics. The application signature together with the system signature are inputs to the default power and time model used by EARL. EAR includes a plugin mechanism for power policies, in that way, EARL can be extended to offer new policies. A power model allows the evaluation of new models and/or approaches for power/time projections such as neural networks. Energy accounting and power monitoring is provided by EARplug and EARD. EARD is a Linux service running with privileges in computing nodes. This service continuously monitors power and other relevant nodes metrics such as temperature and average frequency and reports them to the DB through EARDBD (internal EAR component not extended in the project). EARD uses an energy plugin to provide energy readings. By default, plugins for the Intel NodeManager and Lenovo SD650 used at LRZ are provided using the openipmi driver as well as a node energy estimation based on RAPL counters.

Integration

EAR has extensible offering of plugins for policies, power models, application tracing, and node energy readings. It is configurable through a centralized configuration file. EAR works independently of the scheduler, only using the currently existing SLURM SPANK plugin mechanism to make transparent application monitoring and management. EAR is highly configurable and some components can be used without others, providing partial functionality. The EAR runtime is loaded using the LD_PRELOAD mechanism and it is currently integrated in HPC systems using SLURM SPANK plugin, which makes application monitoring and management transparent to end users. Its API is simple and can be easily integrated with other schedulers. EARL is compatible with the utilization of other instrumentation libraries such as Extrae (https://tools.bsc.es/extrae), using trace plugins. The runtime uses standard profiling libraries to get non-privileged metrics, such as PAPI. The EAR daemon is a UNIX service, independent of the scheduler. The EAR Daemon configuration is based on a text file and includes concepts as privileged users, default configurations for multiple policies, and allows to specify if it is running in production or benchmarking clusters, to run applications with predefined configurations and more.

Sophistication

In REGALE, EAR will be enhanced with various plugins to implement sophisticated policies for job performance, energy and power, as well as power models and tracing tools.

Please visit https://www.bsc.es/research-and-development/software-and-apps/software-list/ear-energy-management-framework-hpc for more information on EAR.