Regressing Transformers for Data-efficient Visual Place Recognition

  • Maria Leyva-Vallina*
  • , Nicola Strisciuglio
  • , Nicolai Petkov
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
48 Downloads (Pure)

Abstract

Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Robotics and Automation, ICRA 2024
PublisherIEEE
Pages15898-15904
Number of pages7
ISBN (Electronic)979-8-3503-8457-4
ISBN (Print)979-8-3503-8458-1
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan
Duration: 13-May-202417-May-2024

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Country/TerritoryJapan
CityYokohama
Period13/05/202417/05/2024

Fingerprint

Dive into the research topics of 'Regressing Transformers for Data-efficient Visual Place Recognition'. Together they form a unique fingerprint.

Cite this