Maciej Chrabąszcz

Research interests: AI Safety, Explainable Artificial Inteligence, AI Red Teaming
Maciej Chrabąszcz works on AI security. As part of his research, he creates methods to analyze the behavior of AI models. The goal of such analyses is to find cases for which the model acts inappropriately or strangely. Additionally, using XAI methods, he seeks to understand what these behaviors might have resulted from. This helps to understand when an AI model is malfunctioning and why. Maciej is also a PhD student at the Warsaw University of Technology’s doctoral school.
Selected Publications
Articles
Karolina Seweryn, Anna Kołos, Agnieszka Karlińska, Katarzyna Lorenc, Katarzyna Dziewulska, Maciej Chrabaszcz, Aleksandra Krasnodebska, Paula Betscher, Zofia Cieślińska, Katarzyna Kowol, Julia Moska, Dawid Motyka, Paweł Walkowiak, Bartosz Żuk, Arkadiusz Janz, "PLLuM-Align: Polish Preference Dataset for Large Language Model Alignment", Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, , 2025, 23890–23919.
Aleksandra Krasnodębska, Maciej Chrabąszcz, Wojciech Kusa, "Rainbow-Teaming for the Polish Language: A Reproducibility Study", Proceedings of the 5th Workshop on Trustworthy NLP, TrustNLP 2025, , 155-165.