Statistical mechanics and data science.

Tuesday 25 January 2022, Institut Curie, Amphi Curie

Moderator: Gérard Ben Arous (New York)


Zoom recording Day1.


Afonso Bandeira (Zürich), Giulio Biroli (Paris), Marylou Gabrié (Palaiseau), Aukosh Jagannath (Waterloo), Florent Krzakala (Lausanne), Andrea Montanari (Stanford), Eric Vanden-Eijnden (New York)


Afonso Bandeira (Zürich):Non-commutative Concentration Inequalities.

Matrix Concentration inequalities such as Matrix Bernstein inequality have played an important role in many areas of pure and applied mathematics. These inequalities are intimately related to the celebrated noncommutative Khintchine inequality of Lust-Piquard and Pisier. In the middle of the 2010's, Tropp improved the dimensional dependence of this inequality in certain settings by leveraging cancellations due to non-commutativity of the underlying random matrices, giving rise to the question of whether such dependency could be removed.
In this talk we leverage ideas from Free Probability to fully remove the dimensional dependence in a range of instances, yielding optimal bounds in many settings of interest. As a byproduct we develop matrix concentration inequalities that capture non-commutativity (or, to be more precise, ``freeness''), improving over Matrix Bernstein in a range of instances. No background knowledge of Free Probability will be assumed in the talk.
Joint work with March Boedihardjo and Ramon van Handel, more information at arXiv:2108.06312 [math.PR].


Giulio Biroli (Paris): Renormalization Group Theory and Machine Learning. pdf

Reconstructing, or generating, high dimensional distributions starting from data is a central problem in machine learning and data sciences. I will present a method ``The Wavelet Inverse Renormalization Group'' that combines ideas from physics (renormalization group theory) and computer science (wavelets, stable representations of operators). The Wavelet Inverse Renormalization Group allows to reconstruct in a very efficient way classes of high dimensional distributions hierarchically from large to small spatial scales. I will present the method and then show its applications to data from statistical physics and cosmology. The Wavelet Inverse Renormalization Group Method also provides interesting insights on the interplay between structures of data and architectures of deep neural networks.


Marylou Gabrié (Palaiseau): Enhancing Sampling with Learning: Adaptive Monte Carlo with Normalizing Flows. pdf

In many applications in computational sciences and statistical inference, one seeks to compute expectations on complex high-dimensional distributions. These problems are often plagued by multi-modality/metastability; slow relaxation between unconnected modes leads to slow convergence of estimators of such expectations. In this talk, I will present a strategy to enhance sampling with deep generative models called Normalizing Flows. We will see how blending physics knowledge and learning is the winning cocktail to a drastic acceleration of MCMC convergence by simultaneous sampling and training.


Aukosh Jagannath (Waterloo): Online SGD on non-convex losses from high-dimensional inference. pdf

Stochastic gradient descent (SGD) is a popular tool in inference. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function, which is random and often non-convex. We study the performance of SGD from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. They depend only on an intrinsic property of the population loss, called the information exponent and do not assume uniform control on the loss itself (e.g., convexity or Lipschitz-type bounds). These thresholds are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence, we find that except for the simplest tasks, almost all of the data is used in the initial search phase, i.e., just to get non-trivial correlation with the ground truth, and that after this phase, the descent is rapid and exhibits a law of large numbers. We illustrate our approach by applying it to a wide set of inference tasks such as parameter estimation for generalized linear models and spiked tensor models, phase retrieval, online PCA, as well as supervised learning for single-layer networks with general activation functions. Joint work with G. Ben Arous (NYU Courant) and R. Gheissari (Berkeley)


Florent Krzakala (Lausanne): Generalization in Machine Learning: Insights from Statistical Mechanics Models.

Working in high-dimensions allows one to use powerful theoretical methods from probability theory and statistical physics to obtain precise characterization for many simple machine learning problems. I will present and review some recent works in this direction, and discuss what they teach us in the broader context of generalization, double descent, and over-parameterization in modern machine learning problems, and attempt to discuss the link between these approaches with statistical mechanics and spin glasses.


Andrea Montanari (Stanford): Exact asymptotics and universality for gradient flows and empirical risk minimizers. pdf

Empirical risk minimization (ERM) is the dominant paradigm in statistical learning. Optimizing the empirical risk of neural networks is a highly non-convex optimization problem but, despite this, it is routinely solver to optimality or near optimality using first order methods such has stochastic gradient descent. It has recently been argued that overparametrization plays a key role in explaining this puzzle: overparametrized models are simple to optimize, achieving vanishing or nearly vanishing training error. Surprisingly, the overparametrized models learnt by gradient-based method appear to have good generalization properties.
I will review a few recent mathematical results towards understanding the dynamics of gradient descent algorithms and their generalization properties, focusing on the case of nonlinear models. [Based on joint work with Michael Celentano, Chen Cheng, Basil Saeed, Kangjie Zhou]


Eric Vanden-Eijnden (New York): Scientific Computing in the Age of Machine Learning.

The recent success of machine learning suggests that neural networks may be capable of approximating high-dimensional functions with controllably small errors. As a result, they could outperform standard function interpolation methods that have been the workhorses of scientific computing but do not scale well with dimension. In support of this prospect, here I will review what is known about the trainability and accuracy of shallow neural networks, which offer the simplest instance of nonlinear learning in functional spaces that are fundamentally different from classic approximation spaces. The dynamics of training in these spaces can be analyzed using tools from optimal transport and statistical mechanics, which reveal when and how shallow neural networks can overcome the curse of dimensionality. I will also discuss how scientific computing problem in high-dimension once thought intractable can be revisited through the lens of these results.