Responsible Partners:
Melissa is an on-line data processing solution, developed by EDF and UGA, coupling member execution with on-the-fly statistics computing. The goal of pilot #2 is to leverage Melissa elastic and fault- tolerant workflow to manage very large ensemble runs on a realistic industrial use case provided by EDF, to aggregate the data on-line (in transit) and thus avoid heavy temporary data storage (petabytes of intermediate data). We spoke with Dr. Bruno Raffin, who is Research Director at the National Institute for Research in Digital Science and Technology (INRIA) and co-developer of the Melissa framework.
Can you please summarize what pilot #2 is about?
Pilot #2 targets a very large scale sensibility analysis. A sensibility analysis identifies how the input parameters influence the outputs of numerical simulations, and it is a fundamental part of Uncertainty Quantification (UQ). In the industrial context of EDF (power generation) this is a critical task that is often required by governmental control authorities. However UQ techniques are generally very cumbersome, requiring numerous re-executions of the same numerical simulation with varying parameters. Within REGALE we target a really large sensibility analysis that will require to process about 1PB of data, an amount that even on a supercomputer is not easy to store.
What is the research field?
This is a multidisciplinary research in the convergence of UQ (Uncertainty Quantification) and HPC (High Performance Computing). Furthermore, UQ is a field that uses statistics in the context of numerical simulation. In our research, iterative statistics methods and asynchronous HPC techniques play a very important role.
What are the main challenges in your pilot?
Storing 1PB of data on the file system is not an option (too large, too costly, too slow). We need on-line solutions where the data is processed as soon as produced without being stored. This is where the Melissa framework, co-developed between EDF and UGA/INRIA comes to the rescue. Melissa harnesses very innovative approaches such as iterative statistics, a flexible fault-tolerant and elastic architecture to enable large ubiquitous sensibility analysis, i.e. at full temporal and spatial resolution.
HPC is fundamental. Large scale sensibility analysis are just not possible without it. To give some numbers, the largest analysis performed with Melissa used 30K compute cores to execute 80 000 simulation runs and, at the same time, process on-the-fly the 288TB of data generated.
What do you expect from REGALE?
REGALE will enable to push forward today’s limits of sensibility analysis and UQ. We will rely on advanced resource allocation methods and energy monitoring tools developed by REGALE to optimize Melissa energy consumption, push its efficiency and performance. We are preparing Melissa for being capable of leveraging the full potential of the up-coming Exascale machines. This is a critical step towards developing digital twins of complex systems.