Fine-Grained Image (细粒度图像) – Papers, Codes and Datasets

Table of contents

  1. Introduction

  2. Survey papers

  3. Benchmark datasets

  4. Fine-grained image recognition

    1. Fine-grained recognition by localization-classification subnetworks

    2. Fine-grained recognition by end-to-end feature encoding

    3. Fine-grained recognition with external information

      1. Fine-grained recognition with web data / auxiliary data

      2. Fine-grained recognition with multi-modality data

      3. Fine-grained recognition with humans in the loop

  5. Fine-grained image retrieval

    1. Unsupervised with pre-trained models

    2. Supervised with metric learning

  6. Fine-grained image generation

    1. Generating from fine-grained image distributions

    2. Generating from text descriptions

  7. Future directions of FGIA

    1. Automatic fine-grained models

    2. Fine-grained few shot learning

    3. Fine-grained hashing

    4. FGIA within more realistic settings

  8. Leaderboard

1. Introduction


This homepage lists some representative papers/codes/datasets all about deep learning based fine-grained image, including fine-grained image recognition, fine-grained image retrieval, fine-grained image generation, etc. If you have any questions, please feel free to leave message.

2. Survey papers


A Survey on Deep Learning-based Fine-Grained Object Classification and Semantic Segmentation

Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. International Journal of Automation and Computing, 2017.

3. Benchmark datasets


Summary of popular fine-grained image datasets. Note that ‘‘BBox’’ indicates whether this dataset provides object bounding box supervisions. ‘‘Part anno.’’ means providing the key part localizations. ‘‘HRCHY’’ corresponds to hierarchical labels. ‘‘ATR’’ represents the attribute labels (e.g., wing color, male, female, etc). ‘‘Texts’’ indicates whether fine-grained text descriptions of images are supplied.

Dataset name Year Meta-class Images Categories BBox Part anno. HRCHY ATR Texts
Oxford flower 2008 Flowers 8,189 102         surd
CUB200 2011 Birds 11,788 200 surd surd   surd surd
Stanford Dog 2011 Dogs 20,580 120 surd        
Stanford Car 2013 Cars 16,185 196 surd        
FGVC Aircraft 2013 Aircrafts 10,000 100 surd   surd    
Birdsnap 2014 Birds 49,829 500 surd surd   surd  
NABirds 2015 Birds 48,562 555 surd surd      
DeepFashion 2016 Clothes 800,000 1,050 surd surd   surd  
Fru92 2017 Fruits 69,614 92     surd    
Veg200 2017 Vegetable 91,117 200     surd    
iNat2017 2017 Plants & Animals 859,000 5,089 surd   surd    
RPC 2019 Retail products 83,739 200 surd   surd    

4. Fine-grained image recognition


Fine-grained recognition by localization-classification subnetworks

Fine-grained recognition by end-to-end feature encoding

Fine-grained recognition with external information

   Fine-grained recognition with web data / auxiliary data

   Fine-grained recognition with multi-modality data

   Fine-grained recognition with humans in the loop

5. Fine-grained image retrieval


Unsupervised with pre-trained models

Supervised with metric learning

6. Fine-grained image generation


Generating from fine-grained image distributions

Generating from text descriptions

7. Future directions of FGIA


Fine-grained few shot learning

FGIA within more realistic settings

8. Leaderboard


The section is being continually updated. Since CUB200-2011 is the most popularly used fine-grained dataset, we list the fine-grained recognition leaderboard by treating it as the test bed.

Method Publication BBox Part External information Base model Image resolution Accuracy
PB R-CNN ECCV 2014       Alex-Net 224x224 73.9%
MaxEnt NIPS 2018       GoogLeNet TBD 74.4%
PB R-CNN ECCV 2014 surd     Alex-Net 224x224 76.4%
PS-CNN CVPR 2016 surd surd   CaffeNet 454x454 76.6%
MaxEnt NIPS 2018       VGG-16 TBD 77.0%
Mask-CNN PR 2018   surd   Alex-Net 448x448 78.6%
PC ECCV 2018       ResNet-50 TBD 80.2%
DeepLAC CVPR 2015 surd surd   Alex-Net 227x227 80.3%
MaxEnt NIPS 2018       ResNet-50 TBD 80.4%
Triplet-A CVPR 2016 surd   Manual labour GoogLeNet TBD 80.7%
Multi-grained ICCV 2015     WordNet etc. VGG-19 224x224 81.7%
Krause et al. CVPR 2015 surd     CaffeNet TBD 82.0%
Multi-grained ICCV 2015 surd   WordNet etc. VGG-19 224x224 83.0%
TS CVPR 2016       VGGD+VGGM 448x448 84.0%
Bilinear CNN ICCV 2015       VGGD+VGGM 448x448 84.1%
STN NIPS 2015       GoogLeNet+BN 448x448 84.1%
LRBP CVPR 2017       VGG-16 224x224 84.2%
PDFS CVPR 2016       VGG-16 TBD 84.5%
Xu et al. ICCV 2015 surd surd Web data CaffeNet 224x224 84.6%
Cai et al. ICCV 2017       VGG-16 448x448 85.3%
RA-CNN CVPR 2017       VGG-19 448x448 85.3%
MaxEnt NIPS 2018       Bilinear CNN TBD 85.3%
PC ECCV 2018       Bilinear CNN TBD 85.6%
CVL CVPR 2017     Texts VGG TBD 85.6%
Mask-CNN PR 2018   surd   VGG-16 448x448 85.7%
GP-256 ECCV 2018       VGG-16 448x448 85.8%
KP CVPR 2017       VGG-16 224x224 86.2%
T-CNN IJCAI 2018       ResNet 224x224 86.2%
MA-CNN ICCV 2017       VGG-19 448x448 86.5%
MaxEnt NIPS 2018       DenseNet-161 TBD 86.5%
DeepKSPD ECCV 2018       VGG-19 448x448 86.5%
OSME+MAMC ECCV 2018       ResNet-101 448x448 86.5%
StackDRL IJCAI 2018       VGG-19 224x224 86.6%
DFL-CNN CVPR 2018       VGG-16 448x448 86.7%
PC ECCV 2018       DenseNet-161 TBD 86.9%
KERL IJCAI 2018     Attributes VGG-16 224x224 87.0%
HBP ECCV 2018       VGG-16 448x448 87.1%
Mask-CNN PR 2018   surd   ResNet-50 448x448 87.3%
DFL-CNN CVPR 2018       ResNet-50 448x448 87.4%
NTS-Net ECCV 2018       ResNet-50 448x448 87.5%
HSnet CVPR 2017 surd surd   GoogLeNet+BN TBD 87.5%
MetaFGNet ECCV 2018     Auxiliary data ResNet-34 TBD 87.6%
CrossX ICCV 2019       ResNet-50 448x448 87.7%
DCL CVPR 2019       ResNet-50 448x448 87.8%
TASN CVPR 2019       ResNet-50 448x448 87.9%
Ge et al. CVPR 2019       GoogLeNet+BN Shorter side is 800 px 90.4%

posted @ 2019-11-06 12:59  杨国峰  阅读(3591)  评论(0编辑  收藏  举报