Koki Maeda

Koki Maeda

Doctoral Student · Vision and Language Researcher

Exploring the intersection of computer vision and natural language processing, with a focus on multimodal evaluation metrics and context-aware image captioning.

Research Themes

Multimodal Evaluation & Metrics

Building reliable evaluation protocols for vision-language models, with a focus on grounding and cultural nuance in Japanese contexts.

Instruction-Tuned Multimodal LLMs

Curating instruction datasets and training recipes that keep open-weight models aligned and deployable.

Structured Generation & Understanding

Teaching models to produce and consume structured outputs such as captions, diagrams, and graphs for real-world tasks.

Publications

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) · 2025

Methodology for quickly constructing multimodal datasets tailored for Japanese vision-language models.

Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara

LegalViz: Legal Text Visualization by Text To Diagram Generation

Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) · 2025

Novel approach for visualizing complex legal document structures through diagram generation from text.

Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita

Talks & Workshops

  1. 評価の観点から見る国産VLMの現状 · Japanese Symposium on Open Large Language Models · Tokyo, Japan · 2025-11-26

Experience

  • Sakana AI Applied Research Engineer (Internship) Jan 2026 – Present
  • National Institute of Informatics Research Assistant Jun 2024 – Present
  • Cierpa & Company Engineering Internship Jul 2024 – Mar 2025
  • RIKEN AIP Research Part-timer Jul 2022 – Aug 2024
  • OMRON SINIC X Research Internship Jun 2023 – Jan 2024
  • Tokyo Institute of Technology Research Assistant Dec 2021 – Dec 2022
  • NTT Research Institute Summer Internship Aug 2022 – Sep 2022
  • Future Corporation Strategic AI Group Feb 2021 – Mar 2022

Education

2024 – Present

Ph.D., Tokyo Institute of Technology

Vision and Language, Evaluation

Advisors: Naoaki Okazaki

Focusing on novel evaluation metrics for multimodal systems and cross-modal representation learning.

2022 – 2024

M.Eng., Tokyo Institute of Technology

Vision and Language: Image Captioning

Advisors: Naoaki Okazaki

Developed context-aware image captioning models that generate descriptions based on user preferences.

2018 – 2022

B.Eng., Tokyo Institute of Technology

NLP: Grammatical Error Correction

Advisors: Naoaki Okazaki, Masahiro Kaneko (Mentor)

Created improved evaluation metrics for grammatical error correction systems.

Awards & Fellowships

  • Young Scientist Award, ANLP (2025)
  • Committee Special Awarded Paper, ANLP (2025, 2023)
  • Program for Development of Co-creative Experts towards Top-level AI Research (Science Tokyo BOOST) for Science and Engineering fields (2024–2027)
  • Awarded Paper, ANLP (2022)

Skills & Expertise

Programming

  • Python
  • PyTorch
  • Java

Research Areas

  • Computer Vision
  • Natural Language Processing
  • Multimodal Learning
  • Image Captioning
  • Evaluation Metrics

Languages

  • Japanese (Native)
  • English (Professional)