Curriculum Vitae

Research Focus

Exploring the intersection of computer vision and natural language processing, with a focus on multimodal evaluation metrics and context-aware image captioning.

Designing instruction-tuned multimodal LLMs and evaluation pipelines that balance academic rigor with deployability.
Building curated Japanese-language datasets for captioning, reasoning, and cultural understanding in vision-language systems.

Education

2024 – Present

Ph.D., Tokyo Institute of Technology

Vision and Language, Evaluation

Advisors: Naoaki Okazaki

Focusing on novel evaluation metrics for multimodal systems and cross-modal representation learning.
2022 – 2024

M.Eng., Tokyo Institute of Technology

Vision and Language: Image Captioning

Advisors: Naoaki Okazaki

Developed context-aware image captioning models that generate descriptions based on user preferences.
2018 – 2022

B.Eng., Tokyo Institute of Technology

NLP: Grammatical Error Correction

Advisors: Naoaki Okazaki, Masahiro Kaneko (Mentor)

Created improved evaluation metrics for grammatical error correction systems.

Experience

Jan 2026 – Present

Applied Research Engineer (Internship), Sakana AI
Jun 2024 – Present

Research Assistant, National Institute of Informatics
Jul 2024 – Mar 2025

Engineering Internship, Cierpa & Company
Jul 2022 – Aug 2024

Research Part-timer, RIKEN AIP

Vision and Language research at the Language Information Access Technology Team.
Jun 2023 – Jan 2024

Research Internship, OMRON SINIC X
Dec 2021 – Dec 2022

Research Assistant, Tokyo Institute of Technology
Aug 2022 – Sep 2022

Summer Internship, NTT Research Institute

Parameter-efficient pre-training methods for CLIP in Vision and Language tasks.
Feb 2021 – Mar 2022

Strategic AI Group, Future Corporation

SNS data analysis and annotation guideline creation.

Publications

International Conferences & Workshops

Masanari Ohi, Koki Maeda, Ryuto Koike, Daisuke Oba, Nakamasa Inoue, Naoaki Okazaki. From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models · February 2026 · arXiv preprint · arXiv preprint · 15 pages · Double column
Koshiro Saito, Sakae Mizuki, Masanari Ohi, Taishi Nakamura, Taihei Shiotani, Koki Maeda, Youmi Ma, Kakeru Hattori, Kazuki Fujii, Takumi Okamoto, Shigeki Ishida, Hiroya Takamura, Rio Yokota, Naoaki Okazaki. Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs · Montreal, Canada · October 2025 · The 1st Workshop on Multilingual and Equitable Language Technologies (MELT) · pages (to appear) · 21 pages · Double column
Youmi Ma, Sakae Mizuki, Kazuki Fujii, Taishi Nakamura, Masanari Ohi, Hinari Shimada, Taihei Shiotani, Koshiro Saito, Koki Maeda, Kakeru Hattori, Takumi Okamoto, Shigeki Ishida, Rio Yokota, Hiroya Takamura, Naoaki Okazaki. Building instruction-tuning datasets from human-written instructions with open-weight large language models · Montreal, Canada · October 2025 · The 2nd Conference on Language Modeling (COLM) · pages (to appear) · 17 pages · Single column
Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara. Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model · 2025 · Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) · pages (to appear) · 15 pages · Double column
Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita. LegalViz: Legal Text Visualization by Text To Diagram Generation · 2025 · Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) · 20 pages · Double column
Koki Maeda, Tosho Hirasawa, Atsushi Hashimoto, Jun Harashima, Leszek Rybicki, Yusuke Fukasawa, Yoshitaka Ushiku. COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset as a Vision-Language Benchmark · 2024 · Proceedings of The 18th European Conference on Computer Vision (ECCV 2024) · 22 pages · Single column
Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki. Query-based Image Captioning from Multi-context 360-degree Images · 2023 · Findings of the Association for Computational Linguistics: EMNLP 2023 · pages 6940–6954 · 15 pages · Double column
Koki Maeda, Masahiro Kaneko, Naoaki Okazaki. IMPARA: Impact-Based Metric for GEC Using Parallel Data · 2022 · Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022) · pages 3578–3588 · 11 pages · Double column

Domestic Conferences

前田航希, 杉浦一瑳, 小田悠介, 栗田修平, 岡崎直観. llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤 · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
前田航希, 長谷川騎平, 栗田修平, 小田悠介, 徳久良子, 岡崎直観. 日本の文化常識・日常生活知識理解のための視覚言語ベンチマーク MECHA-Ja の構築 · 2024 · 情報処理学会第263回自然言語処理研究会研究報告 (2024-NL-263) · pages 1–7 · 7 pages · Single column
笹川慶人, 前田航希, 杉浦一瑳, 栗田修平, 岡崎直観, 河原大輔. LLM-jp-3 VILA: 日本語マルチモーダルデータセット及び強力な日本語マルチモーダルモデルの構築 · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
大南英理, 宮西大樹, 前田航希, 栗田修平. 多言語での判例事実概要からの法的関係性のグラフ可視化 · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
服部翔, 岡崎直観, 水木栄, 藤井一喜, 中村泰士, 大井聖也, 塩谷泰平, 齋藤幸史郎, Youmi Ma, 前田航希, 岡本拓己, 石田茂樹, 横田理央, 高村大也. Swallowコーパスv2: 教育的な日本語ウェブコーパスの構築 · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
服部翔, 水木栄, 藤井一喜, 中村泰士, 塩谷泰平, 植木快, 新妻巧朗, 川畑輝, 田森秀明, Youmi Ma, 前田航希, 大井聖也, 齋藤幸史郎, 岡本拓己, 石田茂樹, 横田理央, 高村大也, 岡崎直観. 新聞記事からつくる時事と社会に強い日本語LLM · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
Youmi Ma, 水木栄, 藤井一喜, 中村泰士, 大井聖也, 島田比奈理, 塩谷泰平, 齋藤幸史郎, 前田航希, 服部翔, 岡本拓己, 石田茂樹, 横田理央, 高村大也, 岡崎直観. 模倣学習による大規模言語モデルの指示チューニング · March 2025 · 言語処理学会第31回年次大会 (NLP2025) · 6 pages · Double column
橋本敦史, 前田航希, 平澤寅庄, 原島純, RYBICKI Leszek, 深澤祐援, 牛久祥孝. 調理作業理解のための言語資源付き固定視点映像データセットの構築 · 2024 · 2024年度人工知能学会全国大会（第38回） · 4 pages · Double column
前田航希, 栗田修平, 宮西大樹, 岡崎直観. 視覚的文脈を利用した視覚言語モデルによる画像キャプション生成自動評価手法 · March 2024 · 言語処理学会第30回年次大会 (NLP2024) · pages 1996–2001 · 6 pages · Double column
前田航希, 栗田修平, 宮西大樹. QuIC-360◦: 360◦ 画像に対するクエリ指向画像説明文生成のためのデータセット構築 · March 2023 · 言語処理学会第29回年次大会 (NLP2023) · pages 3013–3018 · 6 pages · Double column
前田航希, 金子正弘, 岡崎直観. IMPARA: パラレルデータにおける修正の影響度に基づいた文法誤り訂正の自動評価法 · March 2022 · 言語処理学会第28回年次大会 (NLP2022) · pages 328–333 · 6 pages · Double column

Invited Talks & Workshops

評価の観点から見る国産VLMの現状 · Japanese Symposium on Open Large Language Models · Tokyo, Japan · 2025-11-26 · Poster Presentation, Oral Session

Awards & Fellowships

Young Scientist Award, ANLP (2025)
Committee Special Awarded Paper, ANLP (2025, 2023)
Program for Development of Co-creative Experts towards Top-level AI Research (Science Tokyo BOOST) for Science and Engineering fields (2024–2027)
Awarded Paper, ANLP (2022)

Skills

Programming

Python, PyTorch, Java
Research Areas

Computer Vision, Natural Language Processing, Multimodal Learning, Image Captioning, Evaluation Metrics
Languages

Japanese (Native), English (Professional)

Research Focus

Education

Ph.D., Tokyo Institute of Technology

M.Eng., Tokyo Institute of Technology

B.Eng., Tokyo Institute of Technology

Experience

Applied Research Engineer (Internship), Sakana AI

Research Assistant, National Institute of Informatics

Engineering Internship, Cierpa & Company

Research Part-timer, RIKEN AIP

Research Internship, OMRON SINIC X

Research Assistant, Tokyo Institute of Technology

Summer Internship, NTT Research Institute

Strategic AI Group, Future Corporation

Publications

International Conferences & Workshops

Domestic Conferences

Invited Talks & Workshops

Awards & Fellowships

Skills

Programming

Research Areas

Languages