Query-based Image Captioning from Multi-context 360-degree Images
Findings of the Association for Computational Linguistics: EMNLP 2023 · December 2023
A novel image captioning approach that leverages queries and multi-context 360-degree imagery.
BibTeX
@inproceedings{maeda2023quic360,
title = {Query-based Image Captioning from Multi-context 360-degree Images},
author = {Koki Maeda and Shuhei Kurita and Taiki Miyanishi and Naoaki Okazaki},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
pages = {6940--6954},
year = {2023},
address = {Singapore},
publisher = {Association for Computational Linguistics}
}
Abstract
This paper introduces Query-based Image Captioning (QuIC) for 360-degree images, where a query (words or short phrases) specifies the context to describe. This task is more challenging than conventional image captioning, as it requires fine-grained scene understanding to select content consistent with the user’s intent based on the query. We constructed a dataset comprising 3,940 360-degree images and 18,459 pairs of queries and captions annotated manually. Experiments demonstrate that fine-tuning image captioning models on our dataset can generate more diverse and controllable captions from multiple contexts of 360-degree images.