Probabilistic numerical approaches in the context of large-scale physics

LMBP - ROOM 218, Université Clermont Auvergne, June 28th 2022

Organizer: Please contact Manon Michel (LMBP, Université Clermont Auvergne) for any inquiry.

Speakers: David Hill (ISIMA/LIMOS, UCA), Jens Jasche (Stockholm University), Guilhem Lavaux (IAP, France), Athina Monemvassitis (LMBP, UCA) and Lars Röhrig (TU Dortmund and LPC, UCA)

Schedule (click on title for abstract)

9.00 AM - 9.50 AM Guilhem Lavaux
The Aquila program: cosmological inference with complicated datasets

Cosmology deals with the study of the universe as a singular physical object with specific global properties. Over the last century, scientists have tried to build a consistent picture of our universe from astronomical data sets. Unfortunately, those data sets are both incomplete and complicated to interpret.

Over the last ten years, new data assimilation techniques got developed and made possible through advances in statistics, computer science, and a significant increase in the quantity and quality of data. The Aquila consortium intends to push the scientific analysis of those data sets to the next level. I wish to provide a panorama of the activities of the Aquila consortium. I will present the specific challenges of cosmological datasets and some of our statistical modeling techniques: from “likelihood full” with the BORG algorithm (Bayesian Origin Reconstruction from Galaxies) to “implicit likelihood” methods that rely on the use of neural networks. Beyond pure cosmological inference, members of Aquila also validate results by correlating with other data sets. This phase of validation also sometimes provides additional constraints.

9.50 AM - 10.40 AM Jens Jasche
Large-scale Bayesian inference of cosmic structures in galaxy surveys

The standard model of cosmology predicts a rich phenomenology to test the fundamental physics of the origin of cosmic structure, the accelerating cosmic expansion, and dark matter with next-generation galaxy surveys.

However, progress in the field critically depends on our ability to connect theory with observation and to infer relevant cosmological information from next-generation galaxy data. State-of-the-art data analysis methods focus on extracting only information from a limited number of statistical summaries but ignore significant information in the complex filamentary distribution of the three-dimensional cosmic matter field.

To go beyond classical approaches, I will present our Bayesian physical forward modeling approach aiming at extracting the full physical plausible information from cosmological large-scale structure data in this talk. Using a physics model of structure formation, the approach infers 3D initial conditions from which observed structures originate, maps non-linear density and velocity fields, and provides dynamic structure formation histories including a detailed treatment of uncertainties. A hierarchical Bayes approach paired with an efficient implementation of a Hamiltonian Monte Carlo sampler permits to account for various observational systematic effects while exploring a multi-million-dimensional parameter space. The method will be illustrated through various data applications providing an unprecedented view of the dynamic evolution of structures surrounding us. Inferred mass density fields are in agreement with and provide complementary information to gold-standard gravitational weak lensing and X-ray observations. I will discuss how using Bayesian forward modeling of the three-dimensional cosmic structure permits us to use the cosmic structure as a laboratory for fundamental physics and to gain insights into the cosmic origin, the dark matter, and dark energy phenomenology as well as the nature of gravity. Finally, I will outline a new program to use inferred posterior distributions and information-theoretic concepts to devise new algorithms for optimal acquisition of data and automated scientific discovery.

10.40 AM - 11.10 AM Coffee break

11.10 AM - 11.35 AM Athina Monemvassitis
On the implementation of discrete PDMP-based algorithms

Traditional Markov-chain Monte Carlo methods produce samples from a target probability density by exploring it through a reversible Markov chain. In most cases, this chain satisfies the detail balance condition thanks to the introduction of rejections. Lately, non-reversible algorithms based on continuous Piecewise deterministic Markov processes (PDMP) have been shown to improve the sampling efficiency. Those algorithms produce irreversible ballistic motion while ensuring the global balance by direction changes instead of rejections. Much more efficient than the historical Metropolis-Hastings algorithm, they however require either some upper bound of the gradient of the log of the probability density or knowledge on both its inverse and the zeros of its gradient. In case one does not have direct access to those quantities, a compromise is to discretize the continuous PDMP process, at the cost of introducing direction flips and diminishing the persistent nature of the dynamics. In this talk, after introducing some properties of the continuous PDMPs, I will present the discretization before showing some numerical preliminary results on its efficiency with respect to its continuous counterpart and the Metropolis-Hastings algorithm.

11.35 AM - 12.00 AM Lars Röhrig
The Bayesian Analysis Toolkit and applications Slides here .

This talk faces the main aspects and key features of the Bayesian Analysis Toolkit in julia (BAT.jl), a software package to perform Bayesian inference in the julia programming language. BAT.jl offers a variety of efficient and modular algorithms for sampling, optimization and integration to explore posterior distributions in high-dimensional parameter spaces. After giving an introduction to the main aspects and algorithms of BAT.jl, a use-case is presented to perform indirect searches for physics beyond the Standard Model.

12.00 AM - 12.50 AM David Hill
Stochastic Parallel Simulations, repeatability and reproducibility, what is possible?

Parallel stochastic simulations are too often presented as non-reproducible. However, anyone wishing to produce a computer science work of scientific quality must pay attention to the numerical reproducibility of his simulation results. The main purpose is to obtain at least the same scientific conclusions and when it’s possible the same numerical results. However significant differences can be observed in the results of parallel stochastic simulations if the practitioner fails to apply the best practices. Remembering that pseudo-random number generators are deterministic, it is often possible to reproduce the same numerical results for parallel stochastic simulations by implementing a rigorous method tested up to a billion threads. An interesting point for some practitioners is that method enables to check the parallel results with their sequential counterpart before scaling, thus gaining confidence in the proposed models. This talk will present the concepts of reproducibility and repeatability from an epistemological perspective, then the state of the art for parallel random numbers and the method previously mentioned will be exposed in the current context of high performance computing including silent errors impacting all top 500 machines, including the new Exascale supercomputer.

Workshop funded by the 80 prime project CosmoBayes, associated to SuSa, official website here.