230
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification
arXiv:2512.16921v1 Announce Type: new
Abstract: Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps across models. To address this, we introduce AuditDM, an automated framework that actively disco…
Abstract: Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps across models. To address this, we introduce AuditDM, an automated framework that actively disco…