From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models
arXiv preprint · February 2026
BibTeX
@misc{ohi2026hatch,
title={From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models},
author={Masanari Ohi and Koki Maeda and Ryuto Koike and Daisuke Oba and Nakamasa Inoue and Naoaki Okazaki},
year={2026},
eprint={2602.08735},
archivePrefix={arXiv},
primaryClass={cs.CV}
}