Marie Skłodowska-Curie Postdoctoral Researcher
Department of Computing Sciences
Università Bocconi, Milan, Italy
Hello!
My name is Jérôme Garnier-Brun and I am currently a Marie Skłodowska-Curie postdoctoral research fellow at Università Bocconi in Milan, Italy. I am working with Marc Mézard as part of the Machine Learning & Statistical Physics group of the Department of Computing Sciences on questions relating to the role of structure in data for machine learning. I am also affiliated to the Department of Finance, where I collaborate with Claudio Tebaldi on the application of statistical physics to economics and finance.
Before that, I did my PhD between École polytechnique and Capital Fund Management, within the chair of Econophysics and Complex Systems under the supervision of Jean-Philippe Bouchaud and Michael Benzaquen. There, I mainly worked on the problems at the interface between the physics of spin-glasses and disordered socioeconomic systems.
Outside of research, I enjoy hiking, trail running and cycling in the summer, and skiing and ski mountaineering in the winter. I am also a poor but enthusiastic footballer and a (self-proclaimed) decent cook.
Email: jerome DOT garnier AT unibocconi DOT it
In the last few years, the success of deep learning, and in particular Large Language Models (LLMs), has been largely driven by the availability of large datasets and the development of powerful computing hardware. However, the role of the structure of data in machine learning is still not well understood. In particular, the question of how to exploit the structure of data to improve the performance of machine learning algorithms remains largely open.
We tackle this question by leveraging simple, tree-based models of data. In this context, we have notably understood how transformers sequentially learn the underlying structure of the data model, and seem to implement the optimal inference algorithm.
Visualization of attention patterns in a four layer transformer trained on tree-based hierarchical data.
In addition to LLMs for text, the generative AI revolution has also been enabled by so-called diffusion models for image creation. By training complex architectures to denoise images, one can indeed produce seemingly novel visuals by following a reverse diffusion process starting from pure noise. Despite their effectiveness and widespread adoption, the role of memorized training data in what these models produce is still unclear, posing important challenges from the perspective of privacy and intellectual property. More generally, the way meaningful new samples are being sculpted along the reverse diffusion process remains poorly understood.
We investigate these issues using the tools of statistical physics, and explore the role the structure of data may play there.