Koki Maeda

Koki Maeda

Doctoral Student · Vision and Language Researcher

Exploring the intersection of computer vision and natural language processing, with a focus on multimodal evaluation metrics and context-aware image captioning.

Research Themes研究テーマ

Multimodal Evaluation & Metrics

Building reliable evaluation protocols for vision-language models, with a focus on grounding and cultural nuance in Japanese contexts.

Instruction-Tuned Multimodal LLMs

Curating instruction datasets and training recipes that keep open-weight models aligned and deployable.

Structured Generation & Understanding

Teaching models to produce and consume structured outputs such as captions, diagrams, and graphs for real-world tasks.

Publications論文

JaWildText: 日本語文字認識性能評価のための実世界画像データセット

言語処理学会第32回年次大会 (NLP2026) · 2026

日本語の実世界画像における文字認識と下流タスクを一体的に評価するためのデータセット JaWildText を提案する。高密度 STVQA、レシート KIE、手書き文字 OCR の 3 つの課題を収録し、公開 VLM および OCR 特化モデルの評価を通じて、日本語読字性能になお改善余地があることを示した。NLP2026 委員特別賞受賞論文。

前田 航希, 岡崎 直観

Talks & Workshops招待講演・ワークショップ

  1. 評価の観点から見る国産VLMの現状 · Japanese Symposium on Open Large Language Models · Tokyo, Japan · 2025-11-26

Experience職歴・インターン

  • Sakana AI Applied Research Engineer (Internship) Jan 2026 – Present
  • National Institute of Informatics Research Assistant Jun 2024 – Present
  • Cierpa & Company Engineering Internship Jul 2024 – Mar 2025
  • RIKEN AIP Research Part-timer Jul 2022 – Aug 2024
  • OMRON SINIC X Research Internship Jun 2023 – Jan 2024
  • Tokyo Institute of Technology Research Assistant Dec 2021 – Dec 2022
  • NTT Research Institute Summer Internship Aug 2022 – Sep 2022
  • Future Corporation Strategic AI Group Feb 2021 – Mar 2022

Education学歴

2024 – Present

Ph.D., Tokyo Institute of Technology

Vision and Language, Evaluation

Advisors:指導教員: Naoaki Okazaki

Focusing on novel evaluation metrics for multimodal systems and cross-modal representation learning.

2022 – 2024

M.Eng., Tokyo Institute of Technology

Vision and Language: Image Captioning

Advisors:指導教員: Naoaki Okazaki

Developed context-aware image captioning models that generate descriptions based on user preferences.

2018 – 2022

B.Eng., Tokyo Institute of Technology

NLP: Grammatical Error Correction

Advisors:指導教員: Naoaki Okazaki, Masahiro Kaneko (Mentor)

Created improved evaluation metrics for grammatical error correction systems.

Awards & Fellowships受賞・フェローシップ

  • Young Scientist Award, ANLP (2025)
  • Committee Special Awarded Paper, ANLP (2026, 2025, 2023)
  • Program for Development of Co-creative Experts towards Top-level AI Research (Science Tokyo BOOST) for Science and Engineering fields (2024–2027)
  • Awarded Paper, ANLP (2022)

Skills & Expertiseスキル・専門分野

Programming

  • Python
  • PyTorch
  • Java

Research Areas

  • Computer Vision
  • Natural Language Processing
  • Multimodal Learning
  • Image Captioning
  • Evaluation Metrics

Languages

  • Japanese (Native)
  • English (Professional)