DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing

Middelberg, S., Sattler, T., Untzelmann, O. & Kobbelt, L. Scalable 6-DOF Localization on Mobile Devices. in (eds. Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T.) vol. 8690 268–283 (2014).Suenderhauf, N. et al. Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. in Robotics: Science and Systems XI (Robotics: Science and Systems Foundation, doi: (2015). https://doi.org/10.15607/RSS.2015.XI.022Chaabane, M., Gueguen, L., Trabelsi, A., Beveridge, R. & O’Hara, S. End-to-end Learning Improves Static Object Geo-localization from Video. in Ieee Winter Conference on Applications of Computer Vision Wacv 2021 2062–2071 (Ieee, New York, 2021). doi: (2021). https://doi.org/10.1109/WACV48630.2021.00211Wilson, D. et al. Object Tracking and Geo-localization from Street images. Remote Sens. 14, 2575 (2022).Article
ADS

Google Scholar
Agarwal, S., Snavely, N., Simon, I., Seitz, S. M. & Szeliski, R. Building Rome in a Day. in IEEE 12th International Conference on Computer Vision (ICCV) 72–79 (2009). doi: (2009). https://doi.org/10.1109/ICCV.2009.5459148Acampora, G., Anastasio, P., Risi, M., Tortora, G. & Vitiello, A. Automatic Event Geo-Location in Twitter. IEEE Access. 8, 128213–128223 (2020).Article

Google Scholar
Lowe, D. Distinctive image features from Scale-Invariant keypoints. Int. J. Comput. Vision. 60, 91–110 (2004).Article

Google Scholar
Dalal, N. & Triggs, B. Histograms of Oriented Gradients for Human Detection. in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) vol. 1 886–893 (IEEE, San Diego, CA, USA, 2005). (2005).Bay, H., Tuytelaars, T. & Van Gool, L. S. U. R. F. Speeded up robust features. in Computer Vision – ECCV 2006 (eds Leonardis, A., Bischof, H. & Pinz, A.) vol 3951 404–417 (Springer Berlin Heidelberg, Berlin, Heidelberg, (2006).Chapter

Google Scholar
Rublee, E., Rabaud, V., Konolige, K. & Bradski, G. O. R. B. An efficient alternative to SIFT or SURF. in International Conference on Computer Vision 2564–2571 (IEEE, Barcelona, Spain, 2011). doi: (2011). https://doi.org/10.1109/ICCV.2011.6126544Tang, K., Li, F. F. & Koller, D. Learning latent temporal structure for complex event detection. in IEEE Conference on Computer Vision and Pattern Recognition 1250–1257 (IEEE, Providence, RI, 2012). doi: (2012). https://doi.org/10.1109/cvpr.2012.6247808Jegou, H., Douze, M., Schmid, C. & Perez, P. Aggregating local descriptors into a compact image representation. in IEEE Computer Society Conference on Computer Vision and Pattern Recognition 3304–3311 (IEEE, San Francisco, CA, USA, 2010). doi: (2010). https://doi.org/10.1109/cvpr.2010.5540039Jegou, H. et al. Aggregating local image descriptors into Compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1704–1716 (2012).Article
PubMed

Google Scholar
Xu, M. Queensland University of Technology,. Bridging the divide between visual place recognition and SLAM. doi: (2023). https://doi.org/10.5204/thesis.eprints.240786Kanjilal, R. & Uysal, I. Rich learning representations for human activity recognition: how to empower deep feature learning for biological time series. J. Biomed. Inf. 134, 104180 (2022).Article

Google Scholar
Costa, Y., Oliveira, L., Koerich, A. & Gouyon, F. Music genre recognition using gabor filters and LPQ texture descriptors. in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (ed Ruiz-Shulcloper, J.) (2013). & Sanniti Di Baja, G.) vol. 8259 67–74 (Springer Berlin Heidelberg, Berlin, Heidelberg.Chapter

Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T. & Sivic, J. NetVLAD: CNN Architecture for weakly supervised Place Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1437–1451 (2018).Article
PubMed

Google Scholar
Radenovic, F., Tolias, G., Chum, O. & Fine-Tuning, C. N. N. Image Retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1655–1668 (2019).Article
PubMed

Google Scholar
Berton, G., Masone, C. & Caputo, B. Rethinking Visual Geo-localization for Large-Scale Applications. in IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR 2022) 4868–4878 (IEEE Computer Soc, 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720 – 1264 USA, 2022). doi: (2022). https://doi.org/10.1109/CVPR52688.2022.00483Ali-Bey, A., Chaib-Draa, B. & Giguere, P. MixVPR: Feature Mixing for Visual Place Recognition. in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2997–3006 (IEEE, Waikoloa, HI, USA, 2023). doi: (2023). https://doi.org/10.1109/wacv56688.2023.00301Oquab, M. et al. DINOv2: Learning Robust Visual Features without Supervision. Preprint at (2023). https://doi.org/10.48550/arxiv.2304.07193Tolstikhin, I. O. et al. MLP-Mixer: An all-MLP Architecture for Vision. in Advances in Neural Information Processing Systems (eds Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S. & Vaughan, J. W.) vol. 34 24261–24272 (Curran Associates, Inc., (2021).Masone, C. & Caputo, B. A. Survey on Deep Visual Place Recognition. IEEE Access. 9, 19516–19547 (2021).Article

Google Scholar
Zhang, W. & Kosecka, J. Image Based Localization in Urban Environments. in Third International Symposium on 3D Data Processing, Visualization, and Transmission, Proceedings (eds. Pollefeys, M. & Daniilidis, K.) 33–40Chapel Hill, NC, USA, doi: (2007). https://doi.org/10.1109/3dpvt.2006.80Martin, A., Fischler, Robert, C. & Bolles Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM. 24, 381–395 (1981).Article
MathSciNet

Google Scholar
Zamir, A. R. & Shah, M. Accurate image localization based on Google maps Street View. in Computer Vision – ECCV 2010 (eds Daniilidis, K., Maragos, P. & Paragios, N.) vol. 6314 255–268 (Springer, (2010).Zamir, A. R., Ardeshir, S. & Shah, M. GPS-Tag Refinement Using Random Walks with an Adaptive Damping Factor. in IEEE Conference on Computer Vision and Pattern Recognition 4280–4287 (IEEE, Columbus, OH, USA, 2014). doi: (2014). https://doi.org/10.1109/CVPR.2014.545Noh, H., Araujo, A., Sim, J., Weyand, T. & Han, B. Large-Scale Image Retrieval with Attentive Deep Local Features. in IEEE International Conference on Computer Vision (ICCV) 3476–3485 (IEEE, Venice, 2017). doi: (2017). https://doi.org/10.1109/ICCV.2017.374Ng, T., Balntas, V., Tian, Y. & Mikolajczyk, K. S. O. L. A. R. Second-Order Loss and Attention for Image Retrieval. in Computer Vision–ECCV 2020: 16th European Conference Part XXV 16 (eds. Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M.) 253–270Springer International Publishing, Glasgow, UK, (2020).Chu, T. Y., Chen, Y. M., Huang, L., Xu, Z. G. & Tan, H. Y. A Grid feature-point selection method for large-Scale Street View Image Retrieval based on deep local features. Remote Sens. 12, 3978 (2020).Article
ADS

Google Scholar
Chu, T. Y. et al. IEEE, Waikoloa, HI, USA,. Street View Image Retrieval with Average Pooling Features. in IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium 1205–1208 doi: (2020). https://doi.org/10.1109/IGARSS39084.2020.9323667Yan, L. Q., Cui, Y. M., Chen, Y. J. & Liu, D. F. Hierarchical Attention Fusion for Geo-Localization. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) 2220–2224 (IEEE, New York, 2021). doi: (2021). https://doi.org/10.1109/ICASSP39728.2021.9414517Chu, T. Y. et al. A news picture geo-localization pipeline based on deep learning and street view images. Int. J. Digit. Earth. 15, 1485–1505 (2022).Article
ADS

Google Scholar
Tolias, G., Jenicek, T. & Chum, O. Learning and Aggregating Deep Local descriptors for Instance-Level Recognition. in Computer Vision – ECCV 2020 (eds Vedaldi, A., Bischof, H., Brox, T. & Frahm, J. M.) 460–477 (Springer International Publishing, Cham, (2020).Chapter

Google Scholar
Mishkin, D., Perdoch, M. & Matas, J. Place Recognition with WxBS Retrieval. in CVPR 2015 Workshop on Visual Place Recognition in Changing Environments vol. 30 9Boston, USA, (2015).Kim, H. J., Dunn, E. & Frahm, J. M. Learned Contextual Feature Reweighting for Image Geo-Localization. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3251–3260 (IEEE, Honolulu, HI, 2017). doi: (2017). https://doi.org/10.1109/CVPR.2017.346Yu, J., Zhu, C. Y., Zhang, J., Huang, Q. M. & Tao, D. C. Spatial pyramid-enhanced NetVLAD with Weighted Triplet loss for Place Recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 661–674 (2020).Article
PubMed

Google Scholar
Khaliq, A., Milford, M. & Garg, S. MultiRes-NetVLAD: augmenting Place Recognition Training with Low-Resolution Imagery. IEEE Robot Autom. Lett.7, 3882–3889 (2022).Article

Google Scholar
Liu, L., Li, H. D. & Dai, Y. C. Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. in IEEE/CVF International Conference on Computer Vision (ICCV) 2570–2579 (IEEE, Seoul, Korea (South), 2019). doi: (2019). https://doi.org/10.1109/iccv.2019.00266Ge, Y., xiao, Wang, H., bo, Zhu, F., Zhao, R. & Li, H. Sheng. Self-supervising Fine-grained Region Similarities for Large-scale Image Localizationvol. 12349 369–386 (Springer International Publishing, 2020).Ali-bey, A., Chaib-draa, B. & Giguère, P. GSV-Cities: toward Appropriate supervised Visual Place Recognition. Neurocomputing. 513, 194–203 (2022).Article

Google Scholar
Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for Image Recognition at Scale. in doi: (2021). https://doi.org/10.48550/arXiv.2010.11929Kirillov, A. et al. Segment Anything. Preprint at (2023). http://arxiv.org/abs/2304.02643Wang, R. T. et al. Transformer-Based Place Recognition with Multi-Level Attention Aggregation. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 13638–13647 (IEEE, New Orleans, LA, USA, 2022). doi: (2022). https://doi.org/10.1109/cvpr52688.2022.01328Torii, A., Sivic, J., Okutomi, M. & Pajdla, T. Visual Place Recognition with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2346–2359 (2015).Article
PubMed

Google Scholar
Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M. & Pajdla, T. 24/7 Place Recognition by View Synthesis. IEEE Trans. Pattern Anal. Mach. Intell. 40, 257–271 (2018).Article
PubMed

Google Scholar
Sunderhauf, N., Neubert, P. & Protzel, P. Are we there yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons. in.Ruder, S. An overview of gradient descent optimization algorithms. Preprint at.https://doi.org/10.48550/arXiv.1609.04747 (2017).Article

Google Scholar
Hermans, A., Beyer, L. & Leibe, B. In Defense of the Triplet Loss for Person Re-Identification. in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)IEEE, (2018).Wang, X., Han, X., Huang, W., Dong, D. & Scott, M. R. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5017–5025 (IEEE, Long Beach, CA, USA, 2019). doi: (2019). https://doi.org/10.1109/CVPR.2019.00516Huang, G. S., Zhou, Y., Hu, X. F., Zhao, L. Y. & Zhang, C. L. A survey of the Research Progress in Image Geo- localization. J. Geo-information Sci. 25, 1336–1362 (2023).
Google Scholar
Yandex, A. B. & Lempitsky, V. Aggregating Deep Convolutional Features for Image Retrieval. in IEEE International Conference on Computer Vision (ICCV) 1269–1277 (IEEE, Santiago, Chile, 2015). doi: (2015). https://doi.org/10.1109/iccv.2015.150Razavian, A. S., Sullivan, J., Carlsson, S. & Maki, A. Visual Instance Retrieval with Deep Convolutional Networks. ITE Trans. Media Technol. Appl. 4, 251–258 (2016).
Google Scholar
Tolias, G., Sicre, R. & Jégou, H. Particular object retrieval with integral max-pooling of CNN activations. Preprint at (2016). http://arxiv.org/abs/1511.05879Kordopatis-Zilos, G., Galopoulos, P., Papadopoulos, S. & Kompatsiaris, I. Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale location estimation. in (2021).

DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing

DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Development and validation of a novel liquid-liquid phase separation gene signature for bladder cancer

Visualization obesity risk prediction system based on machine learning

PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model

Distant metastasis patterns among lung cancer subtypes and impact of primary tumor resection on survival in metastatic lung cancer using SEER database

Hot Topics

DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Development and validation of a novel liquid-liquid phase separation gene signature for bladder cancer

Visualization obesity risk prediction system based on machine learning

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Development and validation of a novel liquid-liquid phase separation gene signature for bladder cancer

Visualization obesity risk prediction system based on machine learning

PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model

Popular Articles

DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Development and validation of a novel liquid-liquid phase separation gene signature for bladder cancer

Visualization obesity risk prediction system based on machine learning