Multimodal Evaluation & Metrics
Building reliable evaluation protocols for vision-language models, with a focus on grounding and cultural nuance in Japanese contexts.
Doctoral student exploring multimodal vision-and-language systems, evaluation metrics, and context-aware captioning.
Building reliable evaluation protocols for vision-language models, with a focus on grounding and cultural nuance in Japanese contexts.
Curating instruction datasets and training recipes that keep open-weight models aligned and deployable.
Teaching models to produce and consume structured outputs such as captions, diagrams, and graphs for real-world tasks.
arXiv preprint · 2026
A diagnostic benchmark for evaluating Japanese scene text understanding in vision-language models across STVQA, receipt KIE, and handwriting OCR.
言語処理学会第32回年次大会 (NLP2026) · 2026
日本語の実世界画像における文字認識と下流タスクを一体的に評価するためのデータセット JaWildText を提案する。高密度 STVQA、レシート KIE、手書き文字 OCR の 3 つの課題を収録し、公開 VLM および OCR 特化モデルの評価を通じて、日本語読字性能になお改善余地があることを示した。NLP2026 委員特別賞受賞論文。
arXiv preprint · 2026
Proposes a framework for evaluating human-like multi-image spatial reasoning in multi-modal large language models, analyzing how models establish correspondence across images and translate spatial understanding into actions.
The 1st Workshop on Multilingual and Equitable Language Technologies (MELT) · 2025
The 2nd Conference on Language Modeling (COLM) · 2025
Sakana AI
Applied Research Engineer (Internship)
Jan 2026 – Present
National Institute of Informatics
Research Assistant
Jun 2024 – Present
Cierpa & Company
Engineering Internship
Jul 2024 – Mar 2025
RIKEN AIP
Research Part-timer
Jul 2022 – Aug 2024
OMRON SINIC X
Research Internship
Jun 2023 – Jan 2024
Tokyo Institute of Technology
Research Assistant
Dec 2021 – Dec 2022
NTT Research Institute
Summer Internship
Aug 2022 – Sep 2022
Future Corporation
Strategic AI Group
Feb 2021 – Mar 2022
2024 – Present
Vision and Language, Evaluation
Advisors:指導教員: Naoaki Okazaki
Focusing on novel evaluation metrics for multimodal systems and cross-modal representation learning.
2022 – 2024
Vision and Language: Image Captioning
Advisors:指導教員: Naoaki Okazaki
Developed context-aware image captioning models that generate descriptions based on user preferences.
2018 – 2022
NLP: Grammatical Error Correction
Advisors:指導教員: Naoaki Okazaki, Masahiro Kaneko (Mentor)
Created improved evaluation metrics for grammatical error correction systems.