Adversarial Robustness and Interpretability: Where Empirical ML Meets Formal Guarantees
Date:
In this talk, I will present my research on adversarial robustness and interpretability from an empirical machine learning perspective. Adversarial examples reveal systematic failure modes of neural networks and highlight the limitations of current evaluation practices, both for robustness and for explanation methods such as saliency maps. Across image classification, 3D perception, and physically grounded representations, adversarial analysis can be understood as a form of stress testing that produces concrete counterexamples to implicitly assumed system properties. Building on these observations, I will argue that many difficulties in assessing robustness, interpretability, and accountability stem from the absence of explicit specifications and formal guarantees. While empirical adversarial methods are effective at discovering violations, they do not by themselves clarify which properties should be satisfied or how they can be enforced. I will therefore discuss why formal methods, such as specification, verification, and counterexample-guided refinement, are a natural complement to adversarial machine learning, and outline potential directions for bridging these communities. The aim of the talk is to invite discussion on how empirical failure analysis can be integrated into workflows that support verifiable and accountable AI systems.
