Posts by Collection

portfolio

publications

NeRFail: Neural Radiance Fields-Based Multiview Adversarial Attack

Published in Proceeding of the 38th AAAI Conference on Artificial Intelligence, 2024

This paper is about adversarial robustness on NeRF. Read more

Recommended citation: Jiang, W., Zhang, H., Wang, X., Guo, Z., & Wang, H. (2024, March). Nerfail: Neural radiance fields-based multiview adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 19, pp. 21197-21205). https://ojs.aaai.org/index.php/AAAI/article/view/30113

Traceability and accountability by construction

Published in International Symposium on Leveraging Applications of Formal Methods, 2024

This paper is about how to build an accountable and traceable AI system through cryptographic signatures. Read more

Recommended citation: Wenzel, J., Köhl, M. A., Sterz, S., Zhang, H., Schmidt, A., Fetzer, C., & Hermanns, H. (2024, October). Traceability and accountability by construction. In International Symposium on Leveraging Applications of Formal Methods (pp. 258-280). Cham: Springer Nature Switzerland. https://link.springer.com/chapter/10.1007/978-3-031-75387-9_16

Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds

Published in International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (SETTA 2024), 2024

This paper is about adversarial robustness on 3D point clouds. Read more

Recommended citation: Zhang, H., Cheng, L., He, Q., Huang, W., Li, R., Sicre, R., ... & Zhang, L. (2024, November). Eidos: Efficient, imperceptible adversarial 3d point clouds. In International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (pp. 310-326). Singapore: Springer Nature Singapore. https://link.springer.com/chapter/10.1007/978-981-96-0602-3_17

Saliency Maps Give a False Sense of Explanability to Image Classifiers: An Empirical Evaluation across Methods and Metrics

Published in The 16th Asian Conference on Machine Learning (Conference Track), 2024

This paper is about an empirical evaluation across saliency methods and corresponding explainable metrics. Read more

Recommended citation: Zhang, H., Figueroa, F. T., & Hermanns, H. (2024). Saliency Maps Give a False Sense of Explanability to Image Classifiers: An Empirical Evaluation across Methods and Metrics. In The 16th Asian Conference on Machine Learning (Conference Track). https://raw.githubusercontent.com/mlresearch/v260/main/assets/zhang25a/zhang25a.pdf

Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights

Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

This paper is about systematized evaluation of transferable adversarial robustness on image classification. Read more

Recommended citation: Zhao, Z., Zhang, H., Li, R., Sicre, R., Amsaleg, L., Backes, M., ... & Shen, C. (2025). Revisiting Transferable Adversarial Images: Systemization, Evaluation, and New Insights. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://ieeexplore.ieee.org/abstract/document/11164808/

talks

A Quick Tour: Deep Learning in Adversarial Context

Published:

I was invited to give a tutorial on adversarial attacks in deep learning for researchers attending the workshop. In this session, I introduced the background, motivation, and fundamental concepts of the problem before discussing class work on adversarial attacks and presenting state-of-the-art advancements, including our work on adversarial attacks on images. I also provided an overview of existing tools for running attacks. Read more

Adversarial Robustness in Deep Learning

Published:

I was invited to give a talk on adversarial attacks in deep learning for bachelor’s students. In this session, I introduced the background, motivation, and fundamental concepts of the problem before discussing class work on adversarial attacks and presenting state-of-the-art advancements, including our work on adversarial attacks on images. Read more

Adversarial Attack in 3D Representation

Published:

I was invited to give a talk on adversarial attacks in 3D representation for master’s students. In this session, I introduced the background, motivation, and fundamental concepts of the problem before presenting our work on adversarial attacks targeting 3D point clouds and Neural Radiance Fields (NeRF). Read more

Unveiling AI: Exploring Neural Networks and the Journey Towards Interpretability

Published:

In this invited talk, we introduced the fundamentals of interpretability in neural networks, aiming to make the topic accessible to university students new to the field. We explored why interpretability is essential, discussed key methods for analyzing neural networks, and highlighted how these insights can pave the way for impactful research. The session aimed to inspire and equip students with the knowledge to embark on their AI research journey. Read more

Building Trustworthy AI from the view of Adversarial Robustness and Explainability

Published:

As artificial intelligence (AI) systems become increasingly integrated into safety-critical and high-stakes domains, ensuring their trustworthiness has emerged as a central research challenge. Two foundational pillars of trustworthy AI are adversarial robustness and explainability. Adversarial robustness addresses the vulnerability of machine learning models to carefully crafted perturbations that can cause erroneous or manipulated outputs, exposing critical weaknesses in reliability and security. Explainability, on the other hand, seeks to make AI systems transparent and interpretable, enabling stakeholders to understand, validate, and contest model decisions. In this presentation, I will introduce my research framed around these two core pillars. Read more

Adversarial Robustness and Interpretability: Where Empirical ML Meets Formal Guarantees

Published:

In this talk, I will present my research on adversarial robustness and interpretability from an empirical machine learning perspective. Adversarial examples reveal systematic failure modes of neural networks and highlight the limitations of current evaluation practices, both for robustness and for explanation methods such as saliency maps. Across image classification, 3D perception, and physically grounded representations, adversarial analysis can be understood as a form of stress testing that produces concrete counterexamples to implicitly assumed system properties. Building on these observations, I will argue that many difficulties in assessing robustness, interpretability, and accountability stem from the absence of explicit specifications and formal guarantees. While empirical adversarial methods are effective at discovering violations, they do not by themselves clarify which properties should be satisfied or how they can be enforced. I will therefore discuss why formal methods, such as specification, verification, and counterexample-guided refinement, are a natural complement to adversarial machine learning, and outline potential directions for bridging these communities. The aim of the talk is to invite discussion on how empirical failure analysis can be integrated into workflows that support verifiable and accountable AI systems. Read more

teaching

Exploring Explainability in Machine Learning

Master Seminar, Saarland University, 2024

This seminar course delves into the crucial and evolving field of explainability in machine learning (ML). As ML models become increasingly complex and integral to various domains, understanding how these models make decisions is essential. This course will explore different methodologies for interpreting ML models, including rule-based, attribution-based, example-based, prototype-based, hidden semantics-based, and counterfactual-based approaches. Through a combination of paper readings, discussions, and presentations, students will gain a comprehensive understanding of the challenges and advancements in making ML models transparent and interpretable. Read more