Workshop on Recent advances in stochastic algorithms
LMBP - Amphithéâtre Hennequin (first floor), Université Clermont Auvergne, July 8th-10th 2025
Organizer: Please contact Manon Michel (LMBP, Université Clermont Auvergne) for any inquiry.
Workshop booklet: Available here
Participants (click on title for abstract):
- Eddie Aamari (ENS Paris)
Generative diffusions and minimax estimation
The aim of this talk is to introduce generative models based on diffusions. After a brief reminder of the key concepts of stochastic calculus, we’ll detail how a time-reversed Ornstein-Uhlenbeck process can be used to transport distributions when starting from a Gaussian source. As this reversed process involves the so-called score function, we will then address the question of score learning via the minimization of an empirical contrast. Finally, we’ll discuss the stability of such a method, as well as minimax estimation speeds if time permits. Notes are available here.
Yago Aguado (Université Clermont Auvergne)
Marcelo Domingues (Université Clermont Auvergne)
Florence Forbes (INRIA Grenoble)
Scalable Bayesian Experimental Design with Diffusions
Bayesian Optimal Experimental Design (BOED) is a powerful tool to reduce the cost of running a sequence of experiments. When based on the Expected Information Gain (EIG), design optimization corresponds to the maximization of some intractable expected contrast between prior and posterior distributions. Scaling this maximization to high dimensional and complex settings has been an issue due to BOED inherent computational complexity. In this work, we introduce a pooled posterior distribution with cost-effective sampling properties and provide a tractable access to the EIG contrast maximization via a new EIG gradient expression. Diffusion-based samplers are used to compute the dynamics of the pooled posterior and ideas from bi-level optimization are leveraged to derive an efficient joint sampling-optimization loop. The resulting efficiency gain allows to extend BOED to the well-tested generative capabilities of diffusion models. By incorporating generative models into the BOED framework, we expand its scope and its use in scenarios that were previously impractical. Numerical experiments and comparison with state-of-the-art methods show the potential of the approach.
- Cédric Gerbelot (ENS Lyon)
High-dimensional optimization for the multi-spike tensor PCA problem
A core difficulty in modern machine learning is to understand the convergence of gradient based methods in random, high-dimensional, non-convex landscapes. In this work, we study the behavior of gradient flow, Langevin dynamics and online stochastic gradient descent applied to the multi-spike tensor PCA problem, the goal of which is to recover a set of spikes from noisy observations of the corresponding tensor. The main thrust of our proof relies on a sharp control of the random part of the dynamics, followed by the analysis of a finite dimensional dynamical system, leading to both sample complexity bounds and a complete description of the set of critical points reached by the dynamics. In particular, we obtain sufficient conditions for reaching the global minimizer of the problem from uninformative initializations. At a technical level, we will put our methods, originating in probability and mathematical physics, in perspective with those used in machine learning theory and statistical physics of learning. This talk is based on joint work with Vanessa Piccolo and Gérard Ben Arous.
- Arnaud Guillin (Université Clermont Auvergne)
Lift and Flow Poincare for convergence of irreversible Markov processes
Non reversible Markov processes are usually known to be quicker to converge to equilibrium than reversible ones when targeting the same invariant measure. However, to get a quantitative statement is far from easy… we will adopt a lifting strategy to compare irreversible process such as underdamped Langevin, or stochastic algorithms, Bouncy Particle sampler, Zig-Zag or the Forward process. The approach is sufficiently general to handle physically relevant processes with subtle behaviors at boundaries.
(with A. Eberle, L. Hahn, F. Lorler and M. Michel)
- Sebastiano Grazzi (Bocconi University)
Parallel Computations for Metropolis Markov chains with Picard maps
In this talk, I will present parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map. For Random Walk Metropolis Markov chains targeting log-concave distributions pi on R^d, our algorithm generates samples close to pi in O(d^0.5) parallel iterations with O(d^0.5) processors, therefore speeding up the convergence of the corresponding sequential implementation by a factor O(d^0.5). Furthermore, a modification of our algorithm generates samples from an approximate measure pi_epsilon in O(1) parallel iterations and O(d) processors. We empirically assess the performance of the proposed algorithms in high-dimensional regression problems and an epidemic model where the gradient is unavailable. This is joint work with Giacomo Zanella.
- Pierre Jacob (ESSEC)
Some progress on unbiased MCMC
Unbiased MCMC methods refer to algorithms that deliver unbiased estimators of posterior expectations. Proposed first by Peter Glynn and Chang-Han Rhee in 2014, these methods employ pairs of Markov chains that 1) marginally evolve as prescribed by an MCMC algorithm and 2) are jointly dependent in such a way that the chains coincide at a random meeting time. Designing such couplings is not obvious, but the reward, namely unbiased estimators, is appealing: it addresses both the difficulty of assessing the convergence of Markov chains, and the difficulty of using parallel processors for MCMC. This talk reviews these techniques, assuming little familiarity with classical MCMC algorithms and none with coupling methods. The last part of the talk will cover recent progress toward turning unbiased MCMC into a widely-applicable and powerful addition to the Monte Carlo computational toolbox.
- Benedict Leimkuhler (University of Edinburgh)
SamAdams: an adaptive stepsize Langevin sampling algorithm inspired by the Adam optimizer
Prelude: I will discuss the use of Langevin-style diffusions for ergodic approximation, some of the choices available in the design of numerical algorithms and the standard methods used for their analysis. I will also mention a few of the challenges in the area.
Main talk:I will present a framework for adaptive-stepsize MCMC sampling based on time-rescaled Langevin dynamics, in which the stepsize variation is dynamically driven by an additional degree of freedom. The use of an auxiliary relaxation equation allows accumulation of a moving average of a local monitor function and while avoiding the need to modify the drift term in the physical system. Our algorithm is straightforward to implement and can be readily combined with any off-the-peg fixed-stepsize Langevin integrator. I will focus on a specific variant in which the stepsize is controlled by monitoring the norm of the log-posterior gradient, inspired by the Adam optimizer. As in Adam, the stepsize variation depends on the recent history of the gradient norm, which enhances stability and improves accuracy. I will discuss the application of our method to examples such as Neal’s funnel and a Bayesian neural network for classification of MNIST data. Joint work with Rene Lohmann (Edinburgh) and Peter Whalley (ETH).
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
- Pierre Latouche (Université Clermont Auvergne)
Importance weighted variational graph autoencoders for inference in deep latent position block models
In this presentation, I will first show how latent position models can be made compatible with block models for network analysis through the use of deep generative models. I will focus on the binary edge case and introduce a new random graph model along with a variational graph auto encoding strategy for inference. I will discuss the identifiability of the model and explain how model selection can be performed. I will then move to the sparse discrete edge case in the same deep, block compatible, modelling framework. I will show how importance weighted variational inference can strongly improve the inference procedure over the classical variational auto encoding strategy. I will give the importance weights used for sampling and show that, in the limit, the lower bound obtained converges to the integrated log likelihood of the data. Through this presentation, I will emphasise the need to rely on flexible random graph models to obtain relevant loss functions.
- Cyril Letrouit (Université Paris-Saclay)
Quantitative stability of optimal transport maps
Optimal transport consists in sending a given source probability measure rho to a given target probability measure 𝜇 in an optimal way with respect to a certain cost. On bounded subsets of R^d, if the cost is given by the squared Euclidean distance and if rho is absolutely continuous, there exists a unique optimal transport map from rho to mu. Optimal transport has been widely applied across various domains in statistics: for instance to compare distributions, for defining barycenters, to construct embeddings, in generative modeling, etc.
In this talk, we provide a quantitative answer to the following stability question: if mu is perturbed, can the optimal transport map from rho to mu change significantly? The answer depends on the properties of the density rho. This question takes its roots in numerical optimal transport, and has found applications to other problems like the statistical estimation of optimal transport maps, the random matching problem, or the computation of Wasserstein barycenters.
- Martin Metodiev (Université Clermont-Auvergne)
Easily Computed Marginal Likelihoods for Multivariate Mixture Models Using the THAMES Estimator
We present a new version of the truncated harmonic mean estimator (THAMES) for univariate or multivariate mixture models. The estimator computes the marginal likelihood from Markov chain Monte Carlo (MCMC) samples, is consistent, asymptotically normal and of finite variance. In addition, it is invariant to label switching, does not require posterior samples from hidden allocation vectors, and is easily approximated, even for an arbitrarily high number of components. Its computational efficiency is based on an asymptotically optimal ordering of the parameter space, which can in turn be used to provide useful visualisations. We test it in simulation settings where the true marginal likelihood is available analytically. It performs well against state-of-the-art competitors, even in multivariate settings with a high number of components. We demonstrate its utility for inference and model selection on univariate and multivariate data sets.
Manon Michel (Université Clermont Auvergne)
Martin Rouault (Université de Lille)
Monte Carlo methods with Gibbs measures
Monte Carlo methods such as MCMC are now routine in Bayesian machine learning, yet they come with slow convergence rates. Obtaining tight confidence intervals thus requires building estimators based on a big number of point which can be prohibitive in some applications where the likelihood of the Bayesian model is very expensive to evaluate. It is then desirable to obtain similar confidence intervals with a much smaller number of points, even it building those points is computationally more expensive. In this talk, I will first show how points jointly drawn from a Gibbs measure inspired by statistical physics can yield tighter confidence intervals than MCMC with the same number of points. Sampling points from this Gibbs measure however comes with practical issues since i) there is no known exact sampling algorithm, ii) it relies on the evaluation of untractable quantities. I will address the second issue and show that one can incorporate approximation of those untractable quantities without loss on the asymptotic confidence intervals.
Work in collaboration with Rémi Bardenet and Mylène Maïda.
- Andi Wang (Warwick University)
Explicit convergence rates of the underdamped Langevin diffusion under weight and weak Poincaré inequalities
I will discuss recent advances in obtaining convergence rates for the underdamped Langevin diffusion, the backbone of many non-reversible Monte Carlo methods. After reviewing the case of the reversible overdamped Langevin diffusion with light tails, I will discuss in more detail our recent work focussing on the case of heavy tailed targets in the non-reversible setting.
- Peter Whalley (ETH Zürich)
Scalable kinetic Langevin Monte Carlo methods for Bayesian inference
We introduce Langevin Monte Carlo methods for estimating expectations of observables under high-dimensional probability measures. We discuss discretization strategies and Metropolization methods for removing bias due to discretization error.
We then present a new unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. We prove similar results using both approximate and stochastic gradients and show that our method’s computational cost scales independently of the size of the dataset. Our numerical experiments demonstrate that our unbiased algorithm outperforms the “gold-standard” randomized Hamiltonian Monte Carlo.
We then discuss whether it is always necessary to remove discretization bias in high-dimensional models and applications of unadjusted MCMC algorithms in Bayesian Neural Network models.
Organized thanks to the ANR SuSa project, official website here.