Papers

Peer-reviewed publications and presentations grouped by venue type. Each entry links to a detail page with metadata, abstract, and downloadable PDF.

International Conferences

From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models

arXiv preprint · 2026

Masanari Ohi, Koki Maeda, Ryuto Koike, Daisuke Oba, Nakamasa Inoue, Naoaki Okazaki

PDF

Building instruction-tuning datasets from human-written instructions with open-weight large language models

The 2nd Conference on Language Modeling (COLM) · 2025

Youmi Ma, Sakae Mizuki, Kazuki Fujii, Taishi Nakamura, Masanari Ohi, Hinari Shimada, Taihei Shiotani, Koshiro Saito, Koki Maeda, Kakeru Hattori, Takumi Okamoto, Shigeki Ishida, Rio Yokota, Hiroya Takamura, Naoaki Okazaki

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) · 2025

Methodology for quickly constructing multimodal datasets tailored for Japanese vision-language models.

Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara

PDF Code

LegalViz: Legal Text Visualization by Text To Diagram Generation

Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) · 2025

Novel approach for visualizing complex legal document structures through diagram generation from text.

Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita

PDF

COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset as a Vision-Language Benchmark

Proceedings of The 18th European Conference on Computer Vision (ECCV 2024) · 2024

Introducing a new vision-language dataset based on unedited overhead-view procedural cooking videos.

Koki Maeda, Tosho Hirasawa, Atsushi Hashimoto, Jun Harashima, Leszek Rybicki, Yusuke Fukasawa, Yoshitaka Ushiku

PDF Code

Query-based Image Captioning from Multi-context 360-degree Images

Findings of the Association for Computational Linguistics: EMNLP 2023 · 2023

A novel image captioning approach that leverages queries and multi-context 360-degree imagery.

Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki

PDF Code

DueT: Image-Text Contrastive Transfer Learning with Dual-adapter Tuning

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) · 2023

Vision-language models like CLIP show strong zero-shot performance but struggle when fine-tuned on downstream tasks due to overfitting. This paper proposes DueT (Dual-adapter Tuning), which uses separate adapters for uni-modal and cross-modal features to prevent overfitting while maintaining the pre-trained knowledge. The method introduces contrastive learning between adapted and original features, achieving state-of-the-art results on multiple vision-language benchmarks. DueT demonstrates significant improvements over existing adapter-based methods, particularly in few-shot scenarios where overfitting is most problematic.

Taku Hasegawa, Kyosuke Nishida, Koki Maeda, Kuniko Saito

PDF

IMPARA: Impact-Based Metric for GEC Using Parallel Data

Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022) · 2022

Proposal of a new impact-based metric for grammatical error correction using parallel datasets.

Koki Maeda, Masahiro Kaneko, Naoaki Okazaki

PDF Code

Domestic Conferences

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

The 1st Workshop on Multilingual and Equitable Language Technologies (MELT) · 2025

Koshiro Saito, Sakae Mizuki, Masanari Ohi, Taishi Nakamura, Taihei Shiotani, Koki Maeda, Youmi Ma, Kakeru Hattori, Kazuki Fujii, Takumi Okamoto, Shigeki Ishida, Hiroya Takamura, Rio Yokota, Naoaki Okazaki