About Karsten Roth

I am currently a Research Scientist at Google DeepMind working on multimodal models and post-training.

I recently completed my IMPRS-IS & ELLIS PhD supervised by Zeynep Akata (TU & Helmholtz Munich, Tübingen AI Center) and Oriol Vinyals (Google DeepMind) summa cum laude..

During my PhD, large parts of my research focused on understanding and facilitating effective generalization across modalities and distribution shifts, with applications to multimodal foundation models, knowledge transfer, continual and contrastive learning. I am also very interested in the applications to the medical and natural science domain.

I've had the pleasure to spent time with wonderful people at Google DeepMind [Olivier J. Henaff, Ivana Balazevic, Dima Damen], Meta AI [Diane Bouchacourt, Pascal Vincent, Mark Ibrahim], Amazon AWS [Peter Gehler, Thomas Brox], Vector [Marzyeh Ghassemi] and MILA [Joseph Paul Cohen, Yoshua Bengio].

Before ML, I completed my Master and Bachelor of Physics at Heidelberg University with initial focus on Medical and Solid State Physics.

Community Service
Outstanding reviewer for CVPR 2022/2023/2025, ECCV 2022/24, ICCV 2025, ICLR 2024, NeurIPS 2024.
Reviewer for CVPR 22-24, ECCV 22/24, ICCV 21/23, NeurIPS 23/24, ICLR 24/25, ICML 24, TPAMI, MICCAI, (...)

I'm always open to collaborations or project supervisions! Just drop me a message :).

News

[Feb 2024] Two Papers Accepted @ CVPR 2025
Our research on context-aware multimodal pretraining, and continual multimodal pretraining via model-merging over time were accepted!

[Sep 2024] Two Papers Accepted @ NeurIPS 2024
Our large study on continual multimodal pretraining, and reward-optimized image-generation via ReNO were accepted!

[July 2024] Paper Accepted @ ECCV 2024
Supervised master thesis project on intervention-aware concept bottleneck models has been accepted to ECCV 2024!

[June 2024] Paper Accepted @ CoLLAs 2024
Our critical investigation of rehearsal-free continual learning with pretraine models has been accepted to CoLLAs 2024!

[May 2024] Presented @ ICLR 2024
Presented two main-conference papers (general knowledge transfer (spotlight) and text-conditional image-retrieval), and one supervised workshop paper at the Re-Align workshop.
Vienna, Austria

[March 2024] Winter School Amsterdam
Participant at the Winter School on Foundation Models 2024. Including poster presentation of our ICLR 2024 spotlight work on general knowledge transfer.
Amsterdam, Netherlands

[Jan 2024] Two papers accepted to ICLR 2024
Our works on general knowledge transfer between pretrained models (spotlight) and achieving training-free and interpretable compositional image retrieval were accepted to ICLR 2024 in Vienna!

[Oct 2023] New preprint on Arxiv
We reveal the existence of complementary knowledge between arbitrary models pretrained on the same data, and investigate general tools to transfer this context between pretrained models while retaining base knowledge. Link.

[Oct 2023] New preprint on Arxiv
Achieving training-free and interpretable compositional image retrieval. Link.

[Oct 2023] Presented @ ICCV 2023
Presented our work on the shortcomings of LLM descriptions for open-vocabulary classifications, and simple ways to improve performance instead.
Paris, France

[July 2023] One paper accepted to ICCV 2023
WaffleCLIP, our study on improving open-vocabulary image classification with large language model descriptions versus randomized ones has been accepted to ICCV 2023 in Paris. Link.

[July 2023] Recipient of the Qualcomm Innovation Fellowship 2023
Received the Europe Qualcomm Innovation Fellowship 2023. Link.
Remote

[June 2023] New preprint on Arxiv
A Study on Improving Open-Vocabulary Image Classification with Large Language Model Descriptions versus Randomized Ones. Link.

[June 2023] New preprint on Arxiv
Improving text-to-image generation faithfulness through automated candidate selection. Link.

[May 2023] Finalist Qualcomm Innovation Fellowship
Finalist for the Qualcomm Innovation PhD Fellowship award 2023.
Remote

[May 2023] Presented @ ICLR 2023
Presented our work on disentanglement under correlations.
Kigali, Rwanda

[Jan 2023] IMPRS-IS Interview Symposium
Talk: Presented my research on representation learning under distribution shifts to PhD applicants at the IMPRS-IS PhD Interview Symposium.
Remote

[Oct 2022] Presented @ ECCV 2022
Presented our work on probabilistic prototypical constrastive learning.
Tel Aviv, Israel

[Oct 2022] ELLIS
Talked about my ELLIS PhD as part of the official ELLIS PhD Program trailer.
Germany

[July 2022] Presented @ CVPR 2022
Presented three accepted papers on language-guided contrastive learning (oral), avoiding collapse through flows in contrastive learning and industrial anomaly detection.
New Orleans, USA

[June 2022] Finalist Qualcomm Innovation Fellowship
Finalist for the Qualcomm Innovation PhD Fellowship award 2022.
Amsterdam, Netherlands

[May 2022] Pioneer AI Center
Talk: Presentation of CVPR 2022 publications on language guidance and normalizing flows for deep metric learning.
Copenhagen, Danemark

[May 2022] EMVA Young Professional Award
Recipient of the Young Professional Award 2022 endowed by the European Machine Vision Association to honor the outstanding and innovative work of a student or a young professional in the field of machine vision or image processing.
Brussels, Belgium

[March 2022] G-Research
Talk: Introduction to Deep Metric Learning and presentation of interesting research directions in this field.
Remote

[Jan 2022] Ruhr University Bochum
Talk: On Out-of-Distribution Generalization in zero-shot Deep Metric Learning.
Remote

[Oct 2021] Bright Machines
Talk: Autonomous Industrial Anomaly Detection using PatchCore.
Remote

[Jul 2021] Chaudhari Group, Stanford University
Talk: Addressing the shortcomings in Deep Metric Learning research.
Remote

[Jun/Jul 2020] MLSS Tuebingen
Participant at the Machine Learning Summer School in Tuebingen, Germany.
Remote

[Model Merging][Knowledge Distillation][Knowledge Transfer]
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer
This works both motivates and studies the possibility of model merging across arbitrary architectures, highlighting both that most model pairings exhibit structured knowledge differentials, and proposing an effective way to transfer these differentials from one model to the other.
Karsten Roth*, Lukas Thede*, A. Sophia Koepke, Oriol Vinyals, Olivier Hénaff, Zeynep Akata
[Spotlight] ICLR, 2024
arXiv

[Foundation Models][LLMs][Retrieval]
Vision-by-Language for Training-Free Compositional Image Retrieval
We propose a simple, easily extendable and training-free pipeline for compositional image retrieval - i.e. retrieving an image based on an image and a textual modifier query, which beats training-based methods. The modular setup also allows for simple scaling and the study of scaling laws for each pipeline module.
Shyamgopal Karthik*, Karsten Roth*, Massimiliano Mancini, Zeynep Akata
ICLR, 2024
arXiv | code

[Foundation Models][LLMs][CLIP]
Waffling around for Performance: Visual Classification with Random Words and Broad Concepts
We critically study how recently arising approaches to utilize LLM descriptions to extend Vision-Language model performance (s.a. CLIP) truly affect their behaviour. Our experiments reveal limited semantic benefits, allowing us to introduce a simple alternative solely relying on structured input noise.
Karsten Roth*, Jae Myung Kim*, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata
ICCV, 2023
arXiv | code

[Representation Learning][Out-of-Distribution]
Disentanglement of Correlated Factors via Hausdorff Factorized Support
We tackle the problem of disentangled representation learning without the unrealistic assumption of independent factors of variation, i.e. we allow for correlation in the training data. We achieve this through a novel, Hausdorff-distance based objective by factorizing the support instead of the distribution over latents.
Karsten Roth, Mark Ibrahim, Zeynep Akata, Pascal Vincent*, Diane Bouchacourt*
ICLR, 2023
arXiv

[Foundation Models][Continual Learning]
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning
This work showcases that foundation models suffer from continual adaptation, which can be fixed surprisingly easily by retaining a weight-interpolated momentum version, without the need to re-integrate the momentum network back into training.
Zafir Stojanovski*, Karsten Roth*, Zeynep Akata
[Best Paper] INTERPOLATE @ NeurIPS 2022
arXiv

[Contrastive Learning][Uncertainty]
A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning
In this work, we propose a novel probabilistic approach to Deep Metric Learning by describing embeddings and proxies as (non-isotropic) distributions, and the problem of metric learning as that of distribution matching.
Michael Kirchhof*, Karsten Roth*, Zeynep Akata, Enkelejda Kasneci
ECCV, 2022
arXiv

[Contrastive Learning][Large Language Models]
Integrating Language Guidance into Vision-based Deep Metric Learning
We showcase the benefits of re-aligning visual similarity spaces using language semantics, without the need of additional expert supervision and with significant improvements in generalization performance.
Karsten Roth, Oriol Vinyals, Zeynep Akata
[Oral] CVPR, 2022
arXiv | code

[Representation Learning][Out-of-Distribution]
Non-isotropy Regularization for Proxy-based Deep Metric Learning
This work showcases the benefits of resolving local structures in proxy-based Deep Metric Learning without sample-to-sample relations. Doing so retains incredibly fast convergence speeds while ensuring strong generalization performance.
Karsten Roth, Oriol Vinyals, Zeynep Akata
CVPR, 2022
arXiv | code

[Anomaly Detection]
Towards Total Recall in Industrial Anomaly Detection
We develop PatchCore - an anomaly detection method for visual product inspection, which is scalable, fast, extremely accurate, interpretable and usable without expert knowledge. Using Coreset Memories, PatchCore has retained the state-of-the-art for more than a year now.
Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Schoelkopf, Thomas Brox, Peter Gehler
CVPR, 2022
arXiv | code

[Contrastive Learning][Fairness]
Is fairness only metric deep? evaluating and addressing subgroup gaps in deep metric learning
This work proposes finDML, the fairness in non-balanced DML benchmark to characterize representation fairness. We find bias in DML representations to propagate to common downstream classification tasks, even when training data in the downstream task is re-balanced, and propose a regularizer to tackle this.
Natalie Dullerud, Karsten Roth, Kimia Hamidieh, Nicolas Papernot, Marzyeh Ghassemi
ICLR, 2022
arXiv | proceedings | bibtex

[Cell Tracking][Cell Segmentation]
Temporal control of the integrated stress response by a stochastic molecular "switch"
Five year interdisciplinary project studying and evaluating integrated cellular stress response.
Philipp Klein, Stefan M.Kallenberger, Hanna Roth, Karsten Roth, Thi Bach Nga Ly-Hartig, Vera Magg, Janez Aleš, Soheil Rastgou Talemi, Yu Qiang, Steffen Wolf, Olga Oleksiuk, Roma Kurilov, Barbara Di Ventura, Ralf Bartenschlager, Roland Eils, Karl Rohr, Fred A. Hamprecht, Thomas Höfer, Oliver T. Fackler, Georg Stoecklin, Alessia Ruggieri
Science Advances, 2022
Science

[Contrastive Learning]
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning
A seminal project that highlights significant performance saturation in Deep Metric Learning research and the reasons, as well as an initial study into structural drivers of generalization in Deep Metric Learning.
Karsten Roth*, Timo Milbich*, Samarth Sinha, Prateek Gupta, Bjoern Ommer, Joseph Paul Cohen
ICML, 2020
arXiv | proceedings | bibtex | code

[Contrastive Learning][Reinforcement Learning]
Pads: Policy-adapted sampling for visual similarity learning
Using Reinforcement Learning, PADS introduces a tuple mining heuristic that presents the network the right tuples to learn from at the right time.
Karsten Roth*, Timo Milbich*, Bjoern Ommer
CVPR, 2020
arXiv | proceedings | bibtex | code

About me (click to collapse)

Selected Publications (click to collapse)