llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤

前田, 航希; 杉浦, 一瑳; 小田, 悠介; 栗田, 修平; 岡崎, 直観

llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤

前田, 航希, 杉浦, 一瑳, 小田, 悠介, 栗田, 修平, 岡崎, 直観

言語処理学会第31回年次大会 (NLP2025) · March 2025

Methodology for quickly constructing multimodal datasets tailored for Japanese vision-language models.

PDF Code

BibTeX

@inproceedings{maeda2025llm-jp-eval-mm,
  author = {前田, 航希 and 杉浦, 一瑳 and 小田, 悠介 and 栗田, 修平 and 岡崎, 直観},
  month = mar,
  series = {言語処理学会第31回年次大会 (NLP2025)},
  title = {{llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤}},
  year = {2025}
}

PDF

Abstract

The research rapidly advances vision-language models (VLM), but evaluation frameworks for Japanese vision-language (V&L) tasks are still inadequate. This paper introduces llm-jp-eval-mm, a toolkit for systematically evaluating Japanese multimodal tasks. It unifies six existing Japanese multimodal tasks, enabling consistent benchmarking across multiple metrics. The toolkit is publicly available, aiming to facilitate continuous improvement and evaluation of Japanese VLMs.