Research
I am broadly interested in large language and vision–language models, with a particular focus on
post-training and model evaluation. My work involves stress-testing existing models through
extensive
benchmarking to elucidate the limitations of different architectural designs and training paradigms.
Recent Papers
|
Vision Language Models are Biased
An Vo,
Khai-Nguyen Nguyen,
Mohammad Reza Taesiri,
Vy Tuong Dang,
Anh Totti Nguyen,
Daeyoung Kim
ArXiv Preprint, 2025
[Website]
|
|
B-score: Detecting biases in large language models using response history
An Vo,
Mohammad Reza Taesiri,
Daeyoung Kim, and
Anh Totti Nguyen
International Conference on Machine Learning (ICML), 2025
[Website]
|
|
Understanding Generative AI Capabilities in Everyday Image Editing Tasks
Mohammad Reza Taesiri,
Brandon Collins,
Logan Bolton,
Viet Dac Lai,
Franck Dernoncourt,
Trung Bui, and
Anh Totti Nguyen
ArXiv Preprint, 2025
[Website]
|
|
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
Mohammad Reza Taesiri,
Abhijay Ghildyal,
Saman Zadtootaghaj,
Nabajeet Barman, and
Cor-Paul Bezemer
ArXiv Preprint, 2025
[Website]
|
|
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
Tin Nguyen,
Logan Bolton,
Mohammad Reza Taesiri, and
Anh Nguyen
ArXiv Preprint, 2025
[Website]
|
|
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Jonathan Roberts,
Mohammad Reza Taesiri, and
Others
ArXiv Preprint, 2025
[Website]
[Github]
[Dataset]
|
|
VideoGameBunny: Towards vision assistants for video games
Mohammad Reza Taesiri, and
Cor-Paul Bezemer,
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025, (Oral
Presentation)
[Website]
[Model]
[Dataset]
|
|
Vision language models are blind
Pooyan Rahmanzadehgervi,
Logan Bolton,
Mohammad Reza Taesiri, and
Anh Nguyen
Asian Conference on Computer Vision (ACCV), 2024, (Oral
Presentation)
[Website]
[Code]
[Dataset]
|
|
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image
Classification Accuracy for AIs and Humans
Giang Nguyen,
Valerie Chen,
Mohammad Reza Taesiri, and
Anh Nguyen
Transactions on Machine Learning Research (TMLR), 2024
[Website]
[Github]
|
|
GlitchBench: Can large multimodal models detect
video game glitches?
Mohammad Reza Taesiri,
Tianjun Feng,
Anh Nguyen
Cor-Paul Bezemer,
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Website]
[Code]
[Dataset]
|
|
Allowing humans to interactively guide machines where to look does not always improve a
human-AI team's classification
accuracy
Giang Nguyen,
Mohammad Reza Taesiri,
Sunnie S. Y. Kim, and
Anh Nguyen
CVPR Explainable AI for Computer Vision Workshop, 2024
[Github]
|
|
ImageNet-Hard: The Hardest Images Remaining
from a Study of the Power of Zoom and
Spatial Biases in Image Classification
aka. "Zoom is what you need: An empirical study of the
power of zoom
and
spatial biases in
image classification"
Mohammad Reza Taesiri,
Giang Nguyen,
Sarra Habchi,
Cor-Paul Bezemer,
Anh Nguyen
Conference on Neural Information Processing Systems
(NeurIPS), 2023
[Website]
[Code]
[Dataset]
[Dataset-4K]
|
|
Visual correspondence-based explanations
improve AI robustness and human-AI team
accuracy
Mohammad Reza Taesiri
*,
Giang Nguyen*,
Anh Nguyen
(* denotes equal contribution)
Conference on Neural Information Processing Systems
(NeurIPS), 2022
[Website]
[Live Demo]
[Code]
[Video]
|
|
CLIP meets GamePhysics: Towards bug
identification in gameplay videos using zero-shot
transfer learning
Mohammad Reza Taesiri, Finlay Macklon,
Cor-Paul Bezemer
The Mining Software Repositories (MSR) conference,
2022
[Website]
[Code]
[Live Demo]
[Dataset]
|
|
Large Language Models are Pretty Good
Zero-Shot Video Game Bug Detectors
Mohammad Reza Taesiri, Finlay Macklon, Yihe
Wang, Hengshuo Shen, Cor-Paul Bezemer
ArXiv Preprint, 2022
[Website]
[Code]
[Dataset]
|
|