Query-based Image Captioning from Multi-context 360-degree Images

Koki Maeda; Shuhei Kurita; Taiki Miyanishi; Naoaki Okazaki

Query-based Image Captioning from Multi-context 360-degree Images

Koki Maeda, Shuhei Kurita, Taiki Miyanishi, Naoaki Okazaki

Findings of the Association for Computational Linguistics: EMNLP 2023 · December 2023

A novel image captioning approach that leverages queries and multi-context 360-degree imagery.

PDF Code

BibTeX

@inproceedings{maeda2023quic360,
  title = {Query-based Image Captioning from Multi-context 360-degree Images},
  author = {Koki Maeda and Shuhei Kurita and Taiki Miyanishi and Naoaki Okazaki},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
  pages = {6940--6954},
  year = {2023},
  address = {Singapore},
  publisher = {Association for Computational Linguistics}
}

Abstract

This paper introduces Query-based Image Captioning (QuIC) for 360-degree images, where a query (words or short phrases) specifies the context to describe. This task is more challenging than conventional image captioning, as it requires fine-grained scene understanding to select content consistent with the user’s intent based on the query. We constructed a dataset comprising 3,940 360-degree images and 18,459 pairs of queries and captions annotated manually. Experiments demonstrate that fine-tuning image captioning models on our dataset can generate more diverse and controllable captions from multiple contexts of 360-degree images.