A Patch-Level Region-Aware Module with a Multi-Label Framework for Remote Sensing Image Captioning
COASTERS Recent Transformer-based works can generate high-quality captions for remote sensing images (RSIs).However, these methods generally feed global or grid visual features to a Transformer-based captioning model for associating cross-modal information, which limits performance.In this work, we investigate unexplored ideas for remote sensing im