Camille Couturier

Senior Applied Researcher

Efficient AI • Agentic Systems • Context Engineering • Model Understanding

Camille Couturier — portrait

About

I'm a Senior Applied Machine Learning researcher bridging research and production. After a PhD and postdoc in particle physics and cosmology, I moved into applied ML and system design, focusing on efficient inference for LLMs and retrieval-augmented systems.

My research focuses on making AI systems more efficient: every token processed and every millisecond of compute impacts scalability, user experience, and sustainability.

Research interests

Efficient LLM Inference

Optimizing LLM deployments for cost, latency, and sustainability: semantic caching, KV caching, and model routing.

Agentic Systems

Rethinking multi-agent systems: context engineering, memory, and tool integration.

Model Understanding

Probing transformer models internals to inform architecture decisions and model routing.

Selected publications

2025

LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation

A modular memory framework for multi-agent systems that decomposes task trajectories into reusable units.

D. Han, C. Couturier, D. Madrigal, X. Zhang, V. Rühle, S. Rajmohan

arXiv:2510.04851

2025

Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models

Reducing LLM inference costs by caching semantically similar queries and their contextual summaries.

C. Couturier, S. Mastorakis, H. Shen, S. Rajmohan, V. Rühle

arXiv:2505.11271

2025

Exploring How LLMs Capture and Represent Domain-Specific Knowledge

Investigating knowledge representation in LLMs across specialized domains.

M. Hipolito Garcia, C. Couturier, D. Madrigal, A. Mallick, A. Kyrillidis, R. Sim, V. Rühle, S. Rajmohan

arXiv:2504.16871

2024

Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction

M. Xia, X. Zhang, C. Couturier, G. Zheng, S. Rajmohan, V. Rühle

EMNLP 2024 Industry Track

2023

Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs

F. Yang, L. Wang, Z. Xu, J. Zhang, L. Li, B. Qiao, C. Couturier, et al.

ASPLOS 2023

2022

Spot Virtual Machine Eviction Prediction in Microsoft Cloud

F. Yang, B. Pang, J. Zhang, B. Qiao, L. Wang, C. Couturier, et al.

WWW 2022

For earlier publications in particle physics and cosmology (H.E.S.S. Collaboration, MIMAC), see my Google Scholar profile.

Experience

2022 – Present

Senior Applied Researcher

Microsoft – M365 Research / Efficient AI

Efficient inference of language models at scale, context engineering, agentic systems.

2020 – 2022

Applied Researcher

Microsoft

NLP, reinforcement learning, AIOps.

2019 – 2020

AI Resident

Microsoft Research – Cambridge, UK

Multi-task Bayesian optimization for RL training. Performance regression analysis tools.

2017 – 2019

Data Scientist

Booking.com – Amsterdam

Experimentation platform (A/B testing, synthetic control). Recommender systems.

2015 – 2017

Postdoctoral Researcher

CNRS / LPSC – Grenoble

Dark matter detector development (MIMAC project).

2011 – 2014

PhD in Particle Physics

Pierre and Marie Curie University – Paris

Tests of Lorentz invariance with gamma-ray observations (H.E.S.S. Collaboration).

Contact

The best way to reach me is via email: