Program
Overview
The SaTML 2026 conference will be held from March 23–25, 2026, at Technical University of Munich, Germany. The program features keynote talks, paper presentations, a poster session, and a competition track.
Location
The conference will take place at the Technical University of Munich (TUM), in Munich, Germany. The building is Theresianum, Room 602.
The address is Arcisstr. 21, 80333 Munich, with access from Theresienstraße 90, 80333 Munich.
The official address is Theresienstraße 90, but there are several buildings sharing that address. The entrance to the conference venue is on the south side of Theresienstraße, halfway along the block between Luisenstraße (to the west) and Arcisstraße (to the east). Please see this map for more information.
Monday, March 23, 2026
08:40–09:00
Program Chairs: Rachel Cummings and Konrad Rieck
09:00–09:40
Session Chair: Carmela Troncoso
The Continual Challenge: Differential Privacy for Evolving Datasets
Monika Henzinger (Institute of Science and Technology Austria)
10:00–11:00
Session Chair: Martin Pawelczyk
SoK: Data Minimization in Machine Learning
Robin Staab, Nikola Jovanović (ETH Zurich), Kimberly Mai (University College London), Prakhar Ganesh (McGill University / Mila), Martin Vechev (ETH Zurich), Ferdinando Fioretto (University of Virginia) and Matthew Jagielski (Anthropic)
Data minimization (DM) describes the principle of collecting only the data strictly necessary for a given task. It is a foundational principle across major data protection regulations like GDPR and CPRA. Violations of this principle have substantial real-world consequences, with regulatory actions resulting in fines reaching hundreds of millions of dollars. Notably, the relevance of data minimization is particularly pronounced in machine learning (ML) applications, which typically rely on large datasets, resulting in an emerging research area known as Data Minimization in Machine Learning (DMML). At the same time, existing work on other ML privacy and security topics often addresses concerns relevant to DMML without explicitly acknowledging the connection.
This disconnect leads to confusion among practitioners, complicating their efforts to implement DM principles and interpret the terminology, metrics, and evaluation criteria used across different research communities. To address this gap, we present the first systematization of knowledge (SoK) for DMML. We introduce a general framework for DMML, encompassing a unified data pipeline, adversarial models, and points of minimization. This framework allows us to systematically review the literature on data minimization as well as DM-adjacent methodologies whose link to DM was often overlooked. Our structured overview is designed to help practitioners and researchers effectively adopt and apply DM principles in AI/ML, by helping them identify relevant techniques and understand their underlying assumptions and trade-offs through a unified DM-centric lens.
Evaluating Black-Box Vulnerabilities with Wasserstein-Constrained Data Perturbations
Adriana Laurindo Monteiro (FGV - EMap) and Jean-Michel Loubes (Université Paul Sabatier)
The growing use of Machine Learning (ML) tools comes with critical challenges, such as limited model explainability. We propose a global explainability framework that leverages
Optimal Transport and Distributionally Robust Optimization to analyze how ML algorithms respond to constrained data perturbations. We provide a model-agnostic testing bench for both
regression and classification tasks with theoretical guarantees. We establish convergence results and validate the approach on examples and real-world datasets.
Homophily-aware Supervised Contrastive Counterfactual Augmented Fair Graph Neural Network
Mahdi Tavassoli Kejani (Institut de Mathématiques de Toulouse), Fadi Dornaika (University of the Basque Country UPV/EHU, IKERBASQUE, Basque Foundation for Science), Charlotte Laclau (Institut Polytechnique de Paris, France) and Jean-Michel Loubes (Institut de Mathématiques de Toulouse, Université Toulouse 3, INRIA, Projet Regalia)
In recent years, Graph Neural Networks (GNNs) have achieved remarkable success in tasks such as node classification, link prediction, and graph representation learning. However, they remain susceptible to biases that can arise not only from node attributes but also from the graph structure itself. Addressing fairness in GNNs has therefore emerged as a critical research challenge. In this work, we propose a novel model for training fairness-aware GNNs by improving the counterfactual augmented fair graph neural network framework (CAF). Specifically, our approach introduces a two-phase training strategy: in the first phase, we edit the graph to increase homophily ratio with respect to class labels while reducing homophily ratio with respect to sensitive attribute labels; in the second phase, we integrate a modified supervised contrastive loss and environmental loss into the optimization process, enabling the model to jointly improve predictive performance and fairness. Experiments on four real-world datasets demonstrate that our model outperforms CAF and several state-of-the-art graph-based learning methods in both classification accuracy and fairness metrics.
Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction
Somrita Ghosh (Independent researcher), Yuelin Xu and Xiao Zhang (CISPA Helmholtz Center for Information Security)
Achieving high model robustness under adversarial settings is widely recognized as demanding considerable training samples. Recent works propose semi-supervised adversarial training (SSAT) methods with external unlabeled or synthetically generated data, which are the current state-of-the-art. However, SSAT requires substantial extra data to attain high robustness, resulting in prolonged training time and increased memory usage. In this paper, we propose unlabeled data reduction strategies to improve the efficiency of SSAT. Specifically, we design novel latent clustering-based techniques to select or generate a small critical subset of data samples near the model's decision boundary. While focusing on boundary-adjacent points, our methods maintain a balanced ratio between boundary and non-boundary data points to avoid overfitting. Comprehensive experiments on benchmark datasets demonstrate that our methods can significantly reduce SSAT's data requirement and computation costs while preserving its strong robustness advantages. In particular, our latent-space selection scheme based on k-means clustering and our guided DDPM fine-tuning approach with LCG-KM are the most effective, achieving nearly identical robust accuracies with $5\times$ to $10\times$ less unlabeled data and approximately $4\times$ less total runtime.
They’re Closer Than We Think: Tackling Near-OOD Problem
Shaurya Bhatnagar, Ishika Sharma, Ranjitha Prasad (Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)), Vidya T (LightMetrics), Ramya Hebbalaguppe (TCS Research) and Ashish Sethi (LightMetrics)
Abstract— Out-of-Distribution (OoD) detection plays a vital role in the robustness of models in real-world applications. While traditional approaches are effective at detecting samples that are significantly different from the training distribution (far-OoD), they often falter with near-OoD samples, where subtle variations in images pose a challenge for standard methods like likelihood-based detection. In practical applications, near-OoD samples are more prevalent, particularly in fine-grained tasks where instances from different classes exhibit high perceptual similarity. In these scenarios OoD detection relies on subtle, localized features. For instance, in bird species classification, accurate OoD detection requires discerning fine-grained attributes such as beak shape, tail and feather pattern which exhibit substantial structural overlap across both ID and OoD classes. We propose the novel NORD-F framework to detect near-OoDs by disentangling the coarse-structural features from the fine-grained discriminative features. We use gradient reversal based disentangled representation learning which helps in isolating the class-invariant features, allowing the classifier to employ class-specific features. We present theoretical analysis that motivates the design of the novel architecture consisting of an invariance, classification and reconstruction branches. Empirically, we demonstrate that NORD-F outperforms the well-known baselines on fine-grained datasets such as CUB, Stanford-Dogs and Aircraft datasets, for near-OoD detection.
Index Terms— Near-OoD, Fine-grained, Gradient reversal, Disentanglement.
11:20–12:20
Session Chair: Klim Kireev
Position: Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs
Ahmed Salem, Andrew Paverd and Sahar Abdelnabi (Microsoft)
Large language models (LLMs) are typically assumed to be stateless: once a session ends, no information is carried forward unless explicitly stored. In this paper we challenge that assumption by introducing the notion of implicit memory—the ability of models to encode hidden state within their own outputs and later recover it when those outputs are reintroduced as input. We show that this mechanism, though simple, expands the design space of attacks and emergent behaviors. As a concrete demonstration, we construct a new class of temporal backdoors, namely time bombs, which activate only after a sequence of interactions satisfies hidden conditions. Beyond this case study, we analyze broader risks, including covert inter-agent communication, benchmark cheating, situational awareness amplification, and training-data poisoning via hidden encodings. Our discussion distinguishes induced forms of implicit memory, achievable today through straightforward prompting or fine-tuning, from the more speculative but concerning possibility of organic emergence under future optimization pressures. Finally, we outline directions for detection, stress-testing, and evaluation to bound current capabilities and anticipate future developments.
BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
Waris Gill, Natalie Isak and Matthew Dressman (Microsoft)
The widespread deployment of LLMs across enterprise services has created a critical security blind spot. Organizations operate multiple LLM services handling billions of queries daily, yet regulatory compliance boundaries prevent these services from sharing threat intelligence about prompt injection attacks, the top security risk for LLMs. When an attack is detected in one service, the same threat may persist undetected in others for months, as privacy regulations prohibit sharing user prompts across compliance boundaries.
We present BinaryShield, the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries. BinaryShield transforms suspicious prompts through a unique pipeline combining PII redaction, semantic embedding, binary quantization, and randomized response mechanism to potentially generate non-invertible fingerprints that preserve attack patterns while providing privacy. Our evaluations demonstrate that BinaryShield achieves an F1-score of 0.94, significantly outperforming SimHash (0.77), the privacy-preserving baseline, while achieving storage reduction and 38x faster similarity search compared to dense embeddings.
Defending Against Prompt Injection with DataFilter
Yizhu Wang, Sizhe Chen (UC Berkeley), Raghad Alkhudair, Basel Alomair (KACST) and David Wagner (UC Berkeley)
When large language model (LLM) agents are increasingly deployed to automate tasks and interact with untrusted external data, prompt injection emerges as a significant security threat. By injecting malicious instructions into the data that LLMs access, an attacker can arbitrarily override the original user task and redirect the agent toward unintended, potentially harmful actions. Existing defenses either require access to model weights (fine-tuning), incur substantial utility loss (detection-based), or demand non-trivial system redesign (system-level). Motivated by this, we propose DataFilter, a test-time model-agnostic defense that removes malicious instructions from the data before it reaches the backend LLM. DataFilter is trained with supervised fine-tuning on simulated injections and leverages both the user's instruction and the data to selectively strip adversarial content while preserving benign information. Across multiple benchmarks, DataFilter consistently reduces the attack success rates to near zero while maintaining the utility of undefended models. DataFilter delivers strong security, high utility, and plug-and-play deployment, making it a strong practical defense to secure black-box commercial LLMs against prompt injection. Code and scripts for reproducing our results will be made publicly available.
Counterfactual Training: Teaching Models Plausible and Actionable Explanations
Patrick Altmeyer, Aleksander Buszydlik, Arie van Deursen and Cynthia C. S. Liem (Delft University of Technology)
We propose a novel training regime termed counterfactual training that leverages counterfactual explanations to increase the explanatory capacity of models. Counterfactual explanations have emerged as a popular post-hoc explanation method for opaque machine learning models: they inform how factual inputs would need to change in order for a model to
produce some desired output. To be useful in real-world decision-making systems, counterfactuals should be plausible with respect to the underlying data and actionable with respect to the feature mutability constraints. Much existing research has therefore focused on developing post-hoc methods to generate counterfactuals that meet these desiderata. In this work, we instead hold models directly accountable for the desired end goal: counterfactual
training employs counterfactuals during the training phase to minimize the divergence between learned representations and plausible, actionable explanations. We demonstrate empirically
and theoretically that our proposed method facilitates training models that deliver inherently desirable counterfactual explanations and additionally exhibit improved adversarial robustness.
“Org-Wide, We’re Not Ready": C-Level Lessons on Securing Generative AI Systems
Elnaz Rabieinejad Balagafsheh, Ali Dehghantanha (Cyber Science Lab, Canada Cyber Foundry, University of Guelph) and Fattane Zarrinkalam (College of Engineering, University of Guelph)
Enterprises are adopting generative AI (GenAI) faster than they can secure it. We report an empirical study of 20 Canadian Chief Information Security Officers (CISO) that combined semi-structured interviews with a full-day, practitioner-led think tank. We ask (RQ1) how leaders prioritize GenAI threats, (RQ2) where organizations are prepared across the lifecycle, and (RQ3) where governance and assurance frameworks fall short. CISOs consistently rank three exposures as high-likelihood, high-impact: (1) data movement and leakage via everyday assistant use and downstream logs/backups, (2) prompt/model misuse that steers assistants, especially RAG-backed ones, outside intended retrieval scope, and (3) deepfake voice used for authority spoofing and urgent fraud. Readiness is strongest upstream (intake reviews, data classification/lineage, architectural zoning) and weakest at runtime: few teams have EDR-like telemetry for prompts, tool calls, or agent routing, so detection remains largely human-in-the-loop. Current frameworks are principle-heavy but procedure-light and insufficiently sector-tuned. We translate these observations into actionable controls: minimum “AI-EDR” telemetry, sector-ready governance runbooks, and a red-teaming program that moves from single-prompt tests to end-to-end exercises spanning data to tools/APIs. Our findings align investment and policy with the blast-radius CISOs face today, and provide a pragmatic path from static compliance to operational assurance.
13:20–14:20
Session Chair: Fabio Pierazzi
On the Fragility of Contribution Evaluation in Federated Learning
Balázs Pejó (CrySyS Lab, Budapest Univ. of Technology and Economics & HUN-REN-BME Information Systems RG), Marcell Frank (E-Group), Krisztián Varga, Péter Veliczky (Faculty of Natural Sciences, Budapest Univ. of Technology and Economics) and Gergely Biczók (CrySyS Lab, Budapest Univ. of Technology and Economics & HUN-REN-BME Information Systems RG)
This paper investigates the fragility of contribution evaluation in federated learning, a critical mechanism for ensuring fairness and incentivizing participation. We argue that contribution scores are susceptible to significant distortions from two fundamental perspectives: architectural sensitivity and intentional manipulation. First, we explore how different model aggregation methods impact these scores. While most research assumes a basic averaging approach, we demonstrate that advanced techniques, including those designed to handle unreliable or diverse clients, can unintentionally yet significantly alter the final scores. Second, we examine the threat posed by poisoning attacks, where malicious participants strategically manipulate their model updates to either inflate their own contribution scores or reduce others'. Through extensive experiments across diverse datasets and model architectures, implemented within the Flower framework, we rigorously show that both the choice of aggregation method and the presence of attackers can substantially skew contribution scores, highlighting the need for more robust contribution evaluation schemes.
Position: Research in Collaborative Learning Does Not Serve Cross-Silo Federated Learning in Practice
Kevin Kuo, Chhavi Yadav and Virginia Smith (Carnegie Mellon University)
Cross-silo federated learning (FL) is a promising approach to enable cross-organization collaboration in machine learning model development without directly sharing private data. Despite growing organizational interest driven by data protection regulations such as GDPR and HIPAA, the adoption of cross-silo FL remains limited in practice. In this paper, we conduct an interview study to understand the practical challenges to cross-silo FL adoption. With interviews spanning a diverse set of stakeholders such as user organizations, software providers, and academic researchers, we uncover various barriers, from concerns about model performance to questions of incentives and trust between participating organizations. Our study shows that cross-silo FL faces a set of challenges that have yet to be well-captured by existing research in the area and are quite distinct from other forms of federated learning such as cross-device FL. We end with a discussion on future research directions that can help overcome these challenges.
Private Blind Model Averaging – Distributed, Non-interactive, and Convergent
Moritz Kirschte, Sebastian Meiser (University of Lubeck), Saman Ardalan (UKSH Kiel) and Esfandiar Mohammadi (University of Lubeck)
Distributed differentially private learning techniques enable a large number of users to jointly learn a model without having to first centrally collect the training data. At the same time, neither the communication between the users nor the resulting model shall leak information about the training data. This kind of learning technique can be deployed to edge devices if it can be scaled up to a large number of users, particularly if the communication is reduced to a minimum: no interaction, i.e., each party only sends a single message. The best previously known methods are based on gradient averaging, which inherently requires many synchronization rounds.
A promising non-interactive alternative to gradient averaging relies on so-called output perturbation: each user first locally finishes training and then submits its model for secure averaging without further synchronization. We analyze this paradigm, which we coin blind model averaging (BlindAvg), in the setting of convex and smooth empirical risk minimization (ERM) like a support vector machine (SVM). While the required noise scale is asymptotically the same as in the centralized setting, it is not well understood how close BlindAvg comes to centralized learning, i.e., its utility cost.
We characterize and boost the privacy-utility tradeoff of BlindAvg with two contributions:
First, we prove that BlindAvg convergences towards the centralized setting for a sufficiently strong L2-regularization for a non-smooth SVM learner. Second, we introduce the novel differentially private convex and smooth ERM learner SoftmaxReg that has a better privacy-utility tradeoff than an SVM in a multi-class setting.
We evaluate our findings on three datasets (CIFAR-10, CIFAR-100, and Federated EMNIST) and provide ablation in an artificially extreme non-IID scenario.
SoK: Blockchain-Based Decentralized AI (DeAI)
Elizabeth Lui (FLock.io), Rui Sun (Newcastle University & University of Manchester), Vatsal Shah (FLock.io), Xihan Xiong (Imperial College London), Jiahao Sun (FLock.io), Davide Crapis (Ethereum Foundation & PIN AI), William Knottenbelt (Imperial College London) and Zhipeng Wang (University of Manchester)
Centralization enhances the efficiency of Artificial Intelligence (AI) but also introduces critical challenges, including single points of failure, inherent biases, data privacy risks, and scalability limitations. To address these issues, blockchain-based Decentralized Artificial Intelligence (DeAI) has emerged as a promising paradigm that leverages decentralization and transparency to improve the trustworthiness of AI systems. Despite rapid adoption in industry, the academic community lacks a systematic analysis of DeAI ’s technical foundations, opportunities, and challenges. This work presents the first Systematization of Knowledge (SoK) on DeAI, offering a formal definition and precise mathematical model, a taxonomy of existing solutions based on the AI lifecycle, and an in-depth investigation of the roles of blockchain in enabling secure and incentive-compatible collaboration. We further review security risks across the DeAI lifecycle and empirically evaluate representative mitigation techniques. Finally, we highlight open research challenges and future directions for advancing blockchain-based DeAI.
The Feature-Space Illusion: Exposing Practical Vulnerabilities in Blockchain GNN Fraud Detection
François Frankart, Thibault Simonetto, Maxime Cordy, Orestis Papageorgiou, Nadia Pocher and Gilbert Fridgen (University of Luxembourg)
Graph Neural Networks are becoming essential for detecting fraudulent transactions on Ethereum. Yet, their robustness against realistic adversaries remains unexplored. We identify a fundamental gap: existing adversarial machine learning assumes arbitrary feature manipulation, while blockchain adversaries face an inverse feature-mapping problem—they must synthesize costly, cryptographically-valid transactions that produce desired perturbations through deterministic feature extraction. We present the first adversarial framework tailored to blockchain's constraints. Our gradient-guided search exploits partial differentiability: leveraging GNN gradients to identify promising directions, then employing derivative-free optimization to synthesize concrete transactions. For fraud rings controlling multiple accounts, we introduce a probability-weighted objective that naturally prioritizes evasion bottlenecks. Evaluating on real Ethereum transactions reveals architectural vulnerabilities with immediate security implications. Attention mechanisms significantly fail—GATv2 suffers 78.4% attack success rate with merely 2-3 transactions costing negligible amounts relative to fraud proceeds. Remarkably, GraphSAGE exhibits both superior detection (F1=0.905) and robustness (85.2% resistance), suggesting sampling-based aggregation inherently produces more stable decision boundaries than adaptive attention. As GNN-based detection becomes critical DeFi infrastructure, our work exposes the urgent need for architectures explicitly designed for adversarial resilience under real-world constraints. We release our framework and the ETHFRAUD-30K dataset to enable rigorous security evaluation of deployed systems.
14:40–15:20
Session Chair: Thorsten Eisenhofer
🏁 Privacy-Preserving Database Systems CTF Competition
🏁 Agentic System Capture-the-Flag Competition
15:40–17:00
Session Chair: Pavel Laskov
Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints
Kai Yao and Marc Juarez (University of Edinburgh)
Model fingerprint detection techniques have emerged as a promising approach for attributing AI-generated images to their source models, with high detection accuracy in clean settings. Yet, their robustness under adversarial conditions remains largely unexplored. We present the first systematic security evaluation of these techniques, formalizing threat models that encompass both white- and black-box access and two attack goals: fingerprint removal, which erases identifying traces to evade attribution, and fingerprint forgery, which seeks to cause misattribution to a target model. We implement five attack strategies and evaluate 14 representative fingerprinting methods across RGB, frequency, and learned-feature domains on 8 state-of-the-art image generators. Our experiments reveal a pronounced gap between clean and adversarial performance. Removal attacks are highly effective, often achieving success rates above 80% in white-box settings and over 50% under constrained black-box access. While forgery is more challenging than removal, its success significantly varies across targeted models. We also identify a utility-robustness trade-off: methods with the highest attribution accuracy are often vulnerable to attacks, whereas more robust approaches tend to be less accurate. Notably, residual- and manifold-based fingerprints show comparatively stronger black-box resilience than others. These findings highlight the urgent need for developing model fingerprinting techniques that are robust in adversarial settings.
Optimal Robust Recourse with $L^p$-Bounded Model Change
Phone Kyaw, Kshitij Kayastha and Shahin Jabbari (Drexel University)
Recourse provides individuals who received undesirable labels (e.g., denied a loan) from algorithmic decision-making systems with a minimum-cost improvement suggestion to achieve the desired outcome. However, in practice, models often get updated to reflect changes in the data distribution or environment, invalidating the recourse recommendations (i.e., following the recourse will not lead to the desirable outcome). The robust recourse literature addresses this issue by providing a framework for computing recourses whose validity is resilient to slight changes in the model. However, since the optimization problem of computing robust recourse is non-convex (even for linear models), most of the current approaches do not have any theoretical guarantee on the optimality of the recourse. Recent work by~\citet{KayasthaGJ24} provides the first \emph{provably} optimal algorithm for robust recourse with respect to generalized linear models when the model changes are measured using the $L^{\infty}$ norm. However, using the $L^{\infty}$ norm can lead to recourse solutions with a high price. To address this shortcoming, we consider more constrained model changes defined by the $L^p$ norm, where $p\geq 1$ but $p\neq \infty$, and provide a new algorithm that provably computes the optimal robust recourse for generalized linear models. Empirically, for both linear and non-linear models, we demonstrate that our algorithm achieves a significantly lower price of recourse (up to several orders of magnitude) compared to prior work and also exhibits a better trade-off between the implementation cost of recourse and its validity. Our empirical analysis also illustrates that our approach provides more sparse recourses compared to prior work and remains resilient to post-processing approaches that guarantee feasibility.
Are Robust LLM Fingerprints Adversarially Robust?
Anshul Nasery (University of Washington), Edoardo Contente (Sentient Research), Alkin Kaz, Pramod Viswanath (Princeton University) and Sewoong Oh (University of Washington)
Model fingerprinting has emerged as a promising paradigm for claiming model ownership. However, robustness evaluation of these schemes has mostly focused on benign perturbations such as incremental fine-tuning, model merging, and prompting. Lack of systematic investigations into adversarial robustness against a malicious adversary leaves current systems vulnerable.
To fill this gap, we first define a concrete, practical threat model against model fingerprinting. We then take a critical look at existing model fingerprinting schemes to identify their fundamental vulnerabilities. This leads to adaptive adversarial attacks tailored for each vulnerability, that can bypass model authentication completely for several fingerprinting schemes while maintaining high utility of the model for the rest of the users.
Our work encourages fingerprint designers to adopt adversarial robustness by design. We end with recommendations for future fingerprinting methods.
Cascading Robustness Verification: Toward Efficient Model‑Agnostic Certification
Mohammadreza Maleki (Toronto Metropolitan University), Rushendra Sidibomma (University of Minnesota Twin Cities), Arman Adibi (Augusta University) and Reza Samavi (Toronto Metropolitan University)
Certifying the robustness of neural networks (NNs) against adversarial examples remains a major challenge in trustworthy machine learning. Providing formal guarantees that inputs remain robust against all adversarial attacks within a perturbation budget often requires solving non-convex optimization problems. Hence, incomplete verifiers, such as those based on linear programming (LP) or semidefinite programming (SDP), are widely used because they scale efficiently and substantially reduce the cost of robustness verification compared to complete methods. We identify the limitations of relying on a single incomplete verifier, which can underestimate robustness due to \textit{false negatives} arising from loose approximations or \textit{misalignment} between training and verification methods. In this work, we propose \textit{Cascading Robustness Verification (CRV)}, which goes beyond an engineering improvement by exposing fundamental limitations of existing robustness metric and introducing a framework that enhances both reliability and efficiency in verification. CRV is a model-agnostic verifier, meaning that its robustness guarantees are independent of the model's training process. The key insight behind the CRV framework is that when using multiple verification methods, an input is certifiably robust as long as one method verifies the input as robust. Rather than relying solely on a single verifier with a fixed constraint set, CRV progressively applies multiple verifiers to balance the tightness of the bound and computational cost. Starting with the least expensive method, CRV halts as soon as an input is certified as robust; otherwise, it proceeds to more expensive methods. For each computationally expensive method, we introduce a \textit{Stepwise Relaxation (SR)} strategy, which incrementally adds more constraints and checks for certification at each step, thereby avoiding unnecessary computation. Our theoretical analysis demonstrates that CRV consistently achieves equal or higher verified accuracy across all settings compared to powerful but computationally expensive incomplete verifiers in the cascade, such as SDP-based methods, while significantly reducing verification overhead. Empirical results confirm that CRV certifies at least as many inputs as benchmark approaches, while improving runtime efficiency by up to \(\sim 90\%\).
On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses
Mohamed Djilani, Thibault Simonetto, Karim Tit, Florian Tambon (University of Luxembourg), Salah Ghamizi (Luxembourg Institute of Health), Maxime Cordy and Mike Papadakis (University of Luxembourg)
Recent tabular Foundational Models (FM) such as TabPFN and TabICL, leverage in-context learning to achieve strong performance without gradient updates or fine-tuning. However, their robustness to adversarial manipulation remains largely unexplored. In this work, we present a comprehensive study of the adversarial vulnerabilities of tabular FM, focusing on both their fragility to targeted test-time attacks and their potential misuse as adversarial tools. We show on three benchmarks in finance, cybersecurity and healthcare, that small, structured perturbations to test inputs can significantly degrade prediction accuracy, even when the training context remains fixed.
Additionally, we demonstrate that tabular FM can be repurposed to generate transferable evasion to conventional models such as random forests and XGBoost, and on a lesser extent to deep tabular models.
To improve tabular FM, we formulate the robustification problem as an optimisation of the weights (adversarial fine-tuning), or the context (adversarial in-context learning). We introduce an in-context adversarial training strategy that incrementally replaces the context with adversarial perturbed instances, without updating model weights. Our approach improves robustness across multiple tabular benchmarks. Together, these findings position tabular FM as both a target and a source of adversarial threats, highlighting the urgent need for robust training and evaluation practices in this emerging paradigm.
Certifiably Robust RAG against Retrieval Corruption
Chong Xiang (NVIDIA), Tong Wu, Zexuan Zhong (Princeton University), David Wagner (University of California, Berkeley), Danqi Chen and Prateek Mittal (Princeton University)
Retrieval-augmented generation (RAG) is susceptible to retrieval corruption attacks, where malicious passages injected into retrieval results can lead to inaccurate model responses. We propose RobustRAG, the first defense framework with certifiable robustness against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we isolate passages into disjoint groups, generate LLM responses based on the concatenated passages from each isolated group, and then securely aggregate these responses for a robust output. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG achieves certifiable robustness: for certain queries in our evaluation datasets, we can formally certify non-trivial lower bounds on response quality---even against an adaptive attacker with full knowledge of the defense and the ability to arbitrarily inject a bounded number of malicious passages. We evaluate RobustRAG on the tasks of open-domain question-answering and free-form long text generation and demonstrate its effectiveness across three datasets and three LLMs.
17:00–20:00
The poster session will be held in the hall in front of the conference room. Posters will be presented for all accepted submissions, numbered #1–#123.