基于COCO数据集验证的目标检测算法天梯排行榜

基于COCO数据集验证的目标检测算法天梯排行榜


AP50

Rank Model box AP AP50 Paper Code Result Year Tags
1 SwinV2-G (HTC++) 63.1 Swin Transformer V2: Scaling Up Capacity and Resolution Link 2021 Swin-Transformer
2 Florence-CoSwin-H 62.4 Florence: A New Foundation Model for Computer Vision 2021 Swin-Transformer
3 GLIP (Swin-L, multi-scale) 61.5 79.5 Grounded Language-Image Pre-training 2021 multiscale;
Vision Language;
Dynamic Head;
BERT-Base
4 Soft Teacher + Swin-L (HTC++, multi-scale) 61.3 End-to-End Semi-Supervised Object Detection with Soft Teacher 2021 multiscale;
Swin-Transformer
5 DyHead (Swin-L, multi scale, self-training) 60.6 78.5 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale;
Swin-Transformer
6 Dual-Swin-L (HTC, multi-scale) 60.1 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 multiscale
Swin-Transformer
7 Dual-Swin-L (HTC, single-scale) 59.4 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 Swin-Transformer
8 Focal-L (DyHead, multi-scale) 58.9 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
Focal-Transformer
9 DyHead (Swin-L, multi scale) 58.7 77.1 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale
Swin-Transformer
10 Swin-L (HTC++, multi scale) 58.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 multiscale
Swin-Transformer
11 Focal-L (HTC++, multi-scale) 58.4 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
12 Swin-L (HTC++, single scale) 57.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 single scale
Swin-Transformer
13 YOLOR-D6 (1280, single-scale, 34 fps) 57.3 75.0 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
14 SOLQ (Swin-L, single) 56.5 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
15 YOLOR-E6 (1280, single-scale, 45 fps) 56.4 74.1 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
16 CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) 56.4 74.0 Probabilistic two-stage detection 2021 single scale
FPN
DCN
17 QueryInst (single-scale) 56.1 75.9 Instances as Queries 2021
18 YOLOv4-P7 with TTA 55.8 73.2 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
19 DetectoRS (ResNeXt-101-64x4d, multi-scale) 55.7 74.2 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
20 YOLOR-W6 (1280, single-scale, 66 fps) 55.5 73.2 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
21 YOLOv4-P7 CSP-P7 (single-scale, 16 fps) 55.4 73.3 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
22 CSP-p6 + Mish (multi-scale) 55.2 72.9 Mish: A Self Regularized Non-Monotonic Activation Function 2019 multiscale
23 YOLOv4-P6 with TTA 54.9 72.6 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
24 Cascade Eff-B7 NAS-FPN (1280) 54.8 Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation 2020 single scale
NAS-FPN
25 DetectoRS (ResNeXt-101-32x4d, multi-scale) 54.7 73.5 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
26 YOLOv4-P6 CSP-P6 (single-scale, 32 fps) 54.3 72.3 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
27 SpineNet-190 (1280, with Self-training on OpenImages, single-scale) 54.3 Rethinking Pre-training and Self-training 2020 single scale
28 UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) 54.1 71.6 USB: Universal-Scale Object Detection Benchmark 2021 multiscale
DCN
29 EfficientDet-D7 (single-scale) 53.7 72.4 EfficientDet: Scalable and Efficient Object Detection 2019 single scale
30 PAA (ResNext-152-32x8d + DCN, multi-scale) 53.5 71.6 Probabilistic Anchor Assignment with IoU Prediction for Object Detection 2020 ResNeXt
multiscale
DCN
31 LSNet (Res2Net-101+ DCN, multi-scale) 53.5 71.1 Location-Sensitive Visual Recognition with Cross-IOU Loss 2021 multiscale
DCN
32 ResNeSt-200 (multi-scale) 53.3 72.0 ResNeSt: Split-Attention Networks 2020 multiscale
33 Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) 53.3 71.9 CBNet: A Novel Composite Backbone Network Architecture for Object Detection 2019 multiscale
34 DetectoRS (ResNeXt-101-32x4d, single-scale) 53.3 71.6 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
single scale
35 GFLV2 (Res2Net-101, DCN, multiscale) 53.3 70.9 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 multiscale
DCN
36 RelationNet++ (ResNeXt-64x4d-101-DCN) 52.7 RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder 2020 ResNeXt
DCN
37 YOLOv4-P5 with TTA 52.5 70.3 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
38 Deformable DETR (ResNeXt-101+DCN) 52.3 71.9 Deformable DETR: Deformable Transformers for End-to-End Object Detection 2020 ResNeXt
DCN
39 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 52.3 70.9 Global Context Networks 2020 ResNeXt
DCN
GCN
40 RetinaNet (SpineNet-190, 1280x1280) 52.1 71.8 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
41 RepPoints v2 (ResNeXt-101, DCN, multi-scale) 52.1 70.1 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt;
multiscale
DCN
42 AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) 51.9 70.4 Attention-guided Context Feature Pyramid Network for Object Detection 2020 ResNeXt
multiscale
FPN
43 OTA (ResNeXt-101+DCN, multiscale) 51.5 68.6 OTA: Optimal Transport Assignment for Object Detection 2021
44 UniverseNet-20.08d (Res2Net-101, DCN, single-scale) 51.3 70.0 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
45 TSD (SENet154-DCN,multi-scale) 51.2 71.9 Revisiting the Sibling Head in Object Detector 2020 multiscale
DCN
46 YOLOX-X (Modified CSP v5) 51.2 69.6 YOLOX: Exceeding YOLO Series in 2021 2021 YOLO
47 RetinaNet (SpineNet-143, 1280x1280) 50.7 70.4 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
48 ATSS (ResNetXt-64x4d-101+DCN,multi-scale) 50.7 68.9 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection 2019 ResNeXt
multiscale
DCN
49 NAS-FPN (AmoebaNet-D, learned aug) 50.7 Learning Data Augmentation Strategies for Object Detection 2019 FPN
50 GFLV2 (Res2Net-101, DCN) 50.6 69 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
51 aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) 50.2 70.3 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
multiscale
DCN
52 FreeAnchor + SEPC (DCN, ResNext-101-64x4d) 50.1 69.8 Scale-Equalizing Pyramid Convolution for Object Detection 2020 ResNeXt
DCN
53 D2Det (ResNet-101-DCN, multi-scale test) 50.1 69.4 D2Det: Towards High Quality Object Detection and Instance Segmentation 2020 multiscale
DCN
ResNet
54 Dynamic R-CNN (ResNet-101-DCN, multi-scale) 50.1 68.3 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training 2020 multiscale
DCN
ResNet
55 TSD (ResNet-101-Deformable, Image Pyramid) 49.4 69.6 Revisiting the Sibling Head in Object Detector 2020 ResNet
56 RepPoints v2 (ResNeXt-101, DCN) 49.4 68.9 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt
DCN
57 CPNDet (Hourglass-104, multi-scale) 49.2 67.3 Corner Proposal Network for Anchor-free, Two-stage Object Detection 2020 multiscale
58 GFLV2 (ResNeXt-101, 32x4d, DCN) 49 67.6 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNeXt
DCN
59 aLRP Loss (ResNext-101-64x4d, DCN, single scale) 48.9 69.3 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
DCN
60 UniverseNet-20.08 (Res2Net-50, DCN, single-scale) 48.8 67.5 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
61 SOLQ (ResNet101, single scale) 48.7 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
62 RetinaNet (SpineNet-96, 1024x1024) 48.6 68.4 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
63 TridentNet (ResNet-101-Deformable, Image Pyramid) 48.4 69.7 Scale-Aware Trident Networks for Object Detection 2019 ResNet
64 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 48.4 67.6 GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond 2019 ResNeXt
DCN
GCN
65 GFLV2 (ResNet-101-DCN) 48.3 66.5 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
ResNet
66 GFL (X-101-32x4d-DCN, single-scale) 48.2 67.4 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection 2020 ResNeXt
single scale
DCN
67 ISTR (ResNet101-FPN-3x, single-scale) 48.1 ISTR: End-to-End Instance Segmentation with Transformers 2021
68 aLRP Loss (ResNext-101-64x4d, single scale) 47.8 68.4 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
69 MatrixNet Corners (ResNet-152, multi-scale) 47.8 66.2 Matrix Nets: A New Deep Architecture for Object Detection 2019 multiscale
ResNet
70 SOLQ (ResNet50, single scale) 47.8 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
71 SAPD (ResNeXt-101, single-scale) 47.4 67.4 Soft Anchor-Point Object Detection 2019 ResNeXt
single scale
72 PANet (ResNeXt-101, multi-scale) 47.4 67.2 Path Aggregation Network for Instance Segmentation 2018 ResNeXt
multiscale
73 HTC (HRNetV2p-W48) 47.3 65.9 Deep High-Resolution Representation Learning for Visual Recognition 2019
74 HTC (ResNeXt-101-FPN) 47.1 63.9 Hybrid Task Cascade for Instance Segmentation 2019 ResNeXt
FPN
75 CenterNet511 (Hourglass-104, multi-scale) 47.0 64.5 CenterNet: Keypoint Triplets for Object Detection 2019 multiscale
76 MAL (ResNeXt101, multi-scale) 47.0 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
multiscale
77 ISTR (ResNet50-FPN-3x) 46.8 ISTR: End-to-End Instance Segmentation with Transformers 2021 FPN
ResNet
78 RetinaNet (SpineNet-49, 896x896) 46.7 66.3 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
79 RPDet (ResNet-101-DCN, multi-scale) 46.5 67.4 RepPoints: Point Set Representation for Object Detection 2019 multiscale
DCN
ResNet
80 HoughNet (MS) 46.4 65.1 HoughNet: Integrating near and long-range evidence for bottom-up object detection 2020 multiscale
81 PPDet (ResNeXt-101-FPN, multiscale) 46.3 64.8 Reducing Label Noise in Anchor-Free Object Detection 2020 ResNeXt
multiscale
FPN
82 GFLV2 (ResNet-101) 46.2 64.3 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
83 SNIPER (ResNet-101) 46.1 67.0 SNIPER: Efficient Multi-Scale Training 2018 ResNet
84 Mask R-CNN (HRNetV2p-W48 + cascade) 46.1 64.0 Deep High-Resolution Representation Learning for Visual Recognition 2019
85 DCNv2 (ResNet-101, multi-scale) 46.0 67.9 Deformable ConvNets v2: More Deformable, Better Results 2018 multiscale
DCN
ResNet
86 Gaussian-FCOS 46 Localization Uncertainty Estimation for Anchor-Free Object Detection 2020
87 Cascade R-CNN-FPN (ResNet-101, map-guided) 45.9 64.2 InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting 2019 FPN
ResNet
88 MAL (ResNeXt101, single-scale) 45.9 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
single scale
89 CenterMask+VoVNetV2-99 (single-scale) 45.8 64.5 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
90 D-RFCN + SNIP (DPN-98 with flip, multi-scale) 45.7 67.3 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
91 YOLOv4 (CD53) 45.5 64.1 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
92 PP-YOLO (608x608) 45.2 65.2 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
93 AC-FPN Cascade R-CNN (ResNet-101, single scale) 45 64.4 Attention-guided Context Feature Pyramid Network for Object Detection 2019 single scale
FPN
ResNet
94 FreeAnchor (ResNeXt-101) 44.8 64.3 FreeAnchor: Learning to Match Anchors for Visual Object Detection 2019 ResNeXt
95 FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) 44.7 64.1 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
96 CenterMask+VoVNet2-57 (single-scale) 44.7 63.1 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
97 FSAF (ResNeXt-101, multi-scale) 44.6 65.2 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 ResNeXt
multiscale
98 aLRP Loss (ResNext-101, DCN, 500 scale) 44.6 65.0 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
DCN
99 CenterMask + X-101-32x8d (single-scale) 44.6 63.4 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
100 RetinaNet (SpineNet-49, 640x640) 44.3 63.8 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
101 YOLOF-DC5 44.3 62.9 You Only Look One-level Feature 2021 YOLO
102 GFLV2 (ResNet-50) 44.3 62.3 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
103 InterNet (ResNet-101-FPN, multi-scale) 44.2 67.5 Feature Intertwiner for Object Detection 2019 multiscale
FPN
ResNet
104 M2Det (VGG-16, multi-scale) 44.2 64.6 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
105 Faster R-CNN (LIP-ResNet-101-MD w FPN) 43.9 65.7 LIP: Local Importance-based Pooling 2019 FPN
106 M2Det (ResNet-101, multi-scale) 43.9 64.4 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
ResNet
107 YOLOv3 @800 + ASFF* (Darknet-53) 43.9 64.1 Learning Spatial Fusion for Single-Shot Object Detection 2019 YOLO
108 FoveaBox (ResNeXt-101) 43.9 63.5 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
109 ExtremeNet (Hourglass-104, multi-scale) 43.7 60.5 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 multiscale
110 YOLOv4-608 43.5 65.7 YOLOv4: Optimal Speed and Accuracy of Object Detection 2020 single scale
YOLO
111 SNIPER (ResNet-50) 43.5 65.0 SNIPER: Efficient Multi-Scale Training 2018 ResNet
112 CenterNet (HRNetV2-W48) 43.5 Deep High-Resolution Representation Learning for Visual Recognition 2019
113 D-RFCN + SNIP (ResNet-101, multi-scale) 43.4 65.5 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
ResNet
114 Grid R-CNN (ResNeXt-101-FPN) 43.2 63.0 Grid R-CNN 2018 ResNeXt
FPN
115 FCOS (ResNeXt-101-64x4d-FPN) 43.2 62.8 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
116 CornerNet-Saccade (Hourglass-104, multi-scale) 43.2 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019 multiscale
117 Libra R-CNN (ResNeXt-101-FPN) 43.0 64 Libra R-CNN: Towards Balanced Learning for Object Detection 2019 ResNeXt
FPN
118 RPDet (ResNet-101-DCN) 42.8 65.0 RepPoints: Point Set Representation for Object Detection 2019 DCN
ResNet
119 SpineNet-49 (640, RetinaNet, single-scale) 42.8 62.3 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019 single scale
120 Cascade R-CNN (ResNet-101-FPN+, cascade) 42.8 62.1 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
121 Cascade R-CNN 42.8 62.1 Cascade R-CNN: High Quality Object Detection and Instance Segmentation 2019
122 TridentNet (ResNet-101) 42.7 63.6 Scale-Aware Trident Networks for Object Detection 2019 ResNet
123 FCOS (ResNeXt-32x8d-101-FPN) 42.7 62.2 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
124 RetinaMask (ResNeXt-101-FPN-GN) 42.6 62.5 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 ResNeXt
FPN
125 TAL + TAP 42.5 60.3 TOOD: Task-aligned One-stage Object Detection 2021
126 Faster R-CNN (HRNetV2p-W48) 42.4 63.6 Deep High-Resolution Representation Learning for Visual Recognition 2019
127 HSD (Rest101, 768x768, single-scale test) 42.3 61.2 Hierarchical Shot Detector 2019 single scale
128 CornerNet511 (Hourglass-104, multi-scale) 42.1 57.8 CornerNet: Detecting Objects as Paired Keypoints 2018 multiscale
129 FoveaBox (ResNeXt-101) 42.1 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
130 FCOS (HRNet-W32-5l) 42.0 60.4 FCOS: Fully Convolutional One-Stage Object Detection 2019
131 RefineDet512+ (ResNet-101) 41.8 62.9 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
132 GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) 41.6 62.8 Gradient Harmonized Single-stage Detector 2018 FPN
133 CenterNet-DLA (DLA-34, multi-scale) 41.6 Objects as Points 2019 multiscale
134 RetinaNet (SpineNet-49S, 640x640) 41.5 60.5 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
135 RPDet (ResNet-101) 41 62.9 RepPoints: Point Set Representation for Object Detection 2019 ResNet
136 M2Det (VGG-16, single-scale) 41.0 59.7 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
137 FSAF (ResNet-101, single-scale) 40.9 61.5 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 single scale
ResNet
138 RetinaNet (ResNeXt-101-FPN) 40.8 61.1 Focal Loss for Dense Object Detection 2017 ResNeXt
FPN
139 Cascade R-CNN (ResNet-50-FPN+, cascade) 40.6 59.9 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
140 Faster R-CNN (Cascade RPN) 40.6 58.9 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
141 ResNet-50-DW-DPN (Deformable Kernels) 40.6 Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation 2019 ResNet
142 IoU-Net 40.6 Acquisition of Localization Confidence for Accurate Object Detection 2018
143 FCOS (HRNetV2p-W48) 40.5 59.3 Deep High-Resolution Representation Learning for Visual Recognition 2019
144 ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS 40.4 Bounding Box Regression with Uncertainty for Accurate Object Detection 2018 FPN
ResNet
145 RDSNet (ResNet-101, RetinaNet, mask, MBRM) 40.3 60.1 RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation 2019 ResNet
146 ExtremeNet (Hourglass-104, single-scale) 40.2 55.5 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 single scale
147 Mask R-CNN (ResNet-101-FPN, CBN) 40.1 60.5 Cross-Iteration Batch Normalization 2020 FPN
ResNet
148 Fast R-CNN (Cascade RPN) 40.1 59.4 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
149 Mask R-CNN (ResNeXt-101-FPN) 39.8 62.3 Mask R-CNN 2017 ResNeXt
FPN
150 GA-Faster-RCNN 39.8 59.2 Region Proposal by Guided Anchoring 2019
151 FPN (ResNet101 backbone) 39.5 ChainerCV: a Library for Deep Learning in Computer Vision 2017 FPN
ResNet
152 RetinaMask (ResNet-50-FPN) 39.4 58.6 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 FPN
ResNet
153 PP-YOLO (320x320) 39.3 59.3 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
154 AA-ResNet-10 + RetinaNet 39.2 Attention Augmented Convolutional Networks 2019
155 MAL (ResNet50, single-scale) 39.2 Multiple Anchor Learning for Visual Object Detection 2019 single scale
ResNet
156 RetinaNet (ResNet-101-FPN) 39.1 59.1 Focal Loss for Dense Object Detection 2017 FPN
ResNet
157 Cascade R-CNN (ResNet-101-FPN+) 38.8 61.1 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
158 M2Det (ResNet-101, single-scale) 38.8 59.4 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
ResNet
159 SaccadeNet (DLA-34-DCN) 38.5 55.6 SaccadeNet: A Fast and Accurate Object Detector 2020 DCN
160 Mask R-CNN (ResNet-101-FPN) 38.2 60.3 Mask R-CNN 2017 FPN
ResNet
161 WSMA-Seg 38.1 Segmentation is All You Need 2019
162 Faster R-CNN + FPN + CGD 37.9 Compact Global Descriptor for Neural Networks 2019 FPN
163 CornerNet511 (Hourglass-52, single-scale) 37.8 53.7 CornerNet: Detecting Objects as Paired Keypoints 2018 single scale
164 RefineDet512+ (VGG-16) 37.6 58.7 Single-Shot Refinement Neural Network for Object Detection 2017
165 DeformConv-R-FCN (Aligned-Inception-ResNet) 37.5 58.0 Deformable Convolutional Networks 2017
166 Faster R-CNN (ImageNet+300M) 37.4 58 Revisiting Unreasonable Effectiveness of Data in Deep Learning Era 2017
167 Mask R-CNN (Bottleneck-injected ResNet-50, FPN) 36.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN
ResNet
168 Faster R-CNN + TDM 36.8 Beyond Skip Connections: Top-Down Modulation for Object Detection 2016
169 Cascade R-CNN (ResNet-50-FPN+) 36.5 59 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN;
ResNet
170 RefineDet512 (ResNet-101) 36.4 57.5 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
171 Faster R-CNN + FPN 36.2 Feature Pyramid Networks for Object Detection 2016 FPN
172 Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) 35.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN;
ResNet
173 Faster R-CNN (box refinement, context, multi-scale testing) 34.9 Deep Residual Learning for Image Recognition 2015 multiscale
174 Faster R-CNN 34.7 Speed/accuracy trade-offs for modern convolutional object detectors 2016
175 CornerNet-Squeeze 34.4 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019
176 MultiPath Network 33.2 A MultiPath Network for Object Detection 2016
177 ION 33.1 55.7 Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks 2015
178 RefineDet512 (VGG-16) 33 54.5 Single-Shot Refinement Neural Network for Object Detection 2017
179 YOLOv3 + Darknet-53 33.0 YOLOv3: An Incremental Improvement 2018 YOLO
180 SSD512 28.8 48.5 SSD: Single Shot MultiBox Detector 2015
181 MnasFPN (MobileNetV2) 26.1 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
182 ESPNetv2-512 26.0 ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network 2018
183 MnasFPN (MobileNetV3) 25.5 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
184 MnasFPN (MNASNet-B1) 24.6 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
185 MnasFPN x0.7 (MobileNetV2) 23.8 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
186 MobielNet-v1-SSD-300x300+CGD 21.4 Compact Global Descriptor for Neural Networks 2019
187 Fast-RCNN 19.7 Fast R-CNN 2015
188 MobileNet 19.3 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 2017
189 DAT-S (RetinaNet) 69.6 Vision Transformer with Deformable Attention 2022
190 CenterMask-VoVNet99 (multi-scale) 68.3 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 multiscale
191 Mask R-CNN (HRNetV2p-W32 + cascade) 62.5 Deep High-Resolution Representation Learning for Visual Recognition 2019
192 FoveaBox (ResNeXt-101) 61.9 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
193 VirTex Mask R-CNN (ResNet-50-FPN) 61.7 VirTex: Learning Visual Representations from Textual Annotations 2020 FPN;
ResNet
194 Centermask + ResNet101 61.6 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 ResNet
195 PAFNet (ResNet50-vd) 59.8 PAFNet: An Efficient Anchor-Free Object Detector Guidance 2021 ResNet
196 IoU-Net+EnergyRegression 58.5 Energy-Based Models for Deep Probabilistic Regression 2019
197 Cascade R-CNN (HRNetV2p-W48) Deep High-Resolution Representation Learning for Visual Recognition 2019
198 ISTR (ResNet50-FPN-3x, single-scale) ISTR: End-to-End Instance Segmentation with Transformers 2021
199 FoveaBox (ResNeXt-101) FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
200 EfficientDet-D7x (single-scale) EfficientDet: Scalable and Efficient Object Detection 2019 single scale

AP75

Rank Model box AP AP75 Paper Code Result Year Tags
1 SwinV2-G (HTC++) 63.1 Swin Transformer V2: Scaling Up Capacity and Resolution Link 2021 Swin-Transformer
2 Florence-CoSwin-H 62.4 Florence: A New Foundation Model for Computer Vision 2021 Swin-Transformer
3 GLIP (Swin-L, multi-scale) 61.5 67.7 Grounded Language-Image Pre-training 2021 multiscale;
Vision Language;
Dynamic Head;
BERT-Base
4 Soft Teacher + Swin-L (HTC++, multi-scale) 61.3 End-to-End Semi-Supervised Object Detection with Soft Teacher 2021 multiscale;
Swin-Transformer
5 DyHead (Swin-L, multi scale, self-training) 60.6 66.6 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale;
Swin-Transformer
6 Dual-Swin-L (HTC, multi-scale) 60.1 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 multiscale
Swin-Transformer
7 Dual-Swin-L (HTC, single-scale) 59.4 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 Swin-Transformer
8 Focal-L (DyHead, multi-scale) 58.9 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
Focal-Transformer
9 DyHead (Swin-L, multi scale) 58.7 64.5 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale
Swin-Transformer
10 Swin-L (HTC++, multi scale) 58.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 multiscale
Swin-Transformer
11 Focal-L (HTC++, multi-scale) 58.4 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
12 Swin-L (HTC++, single scale) 57.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 single scale
Swin-Transformer
13 YOLOR-D6 (1280, single-scale, 34 fps) 57.3 62.7 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
14 SOLQ (Swin-L, single) 56.5 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
15 YOLOR-E6 (1280, single-scale, 45 fps) 56.4 61.6 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
16 CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) 56.4 61.6 Probabilistic two-stage detection 2021 single scale
FPN
DCN
17 QueryInst (single-scale) 56.1 61.9 Instances as Queries 2021
18 YOLOv4-P7 with TTA 55.8 61.2 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
19 DetectoRS (ResNeXt-101-64x4d, multi-scale) 55.7 61.1 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
20 YOLOR-W6 (1280, single-scale, 66 fps) 55.5 60.6 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
21 YOLOv4-P7 CSP-P7 (single-scale, 16 fps) 55.4 60.7 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
22 CSP-p6 + Mish (multi-scale) 55.2 60.5 Mish: A Self Regularized Non-Monotonic Activation Function 2019 multiscale
23 YOLOv4-P6 with TTA 54.9 60.2 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
24 Cascade Eff-B7 NAS-FPN (1280) 54.8 Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation 2020 single scale
NAS-FPN
25 DetectoRS (ResNeXt-101-32x4d, multi-scale) 54.7 60.1 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
26 YOLOv4-P6 CSP-P6 (single-scale, 32 fps) 54.3 59.5 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
27 SpineNet-190 (1280, with Self-training on OpenImages, single-scale) 54.3 Rethinking Pre-training and Self-training 2020 single scale
28 UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) 54.1 59.9 USB: Universal-Scale Object Detection Benchmark 2021 multiscale
DCN
29 EfficientDet-D7 (single-scale) 53.7 EfficientDet: Scalable and Efficient Object Detection 2019 single scale
30 PAA (ResNext-152-32x8d + DCN, multi-scale) 53.5 59.1 Probabilistic Anchor Assignment with IoU Prediction for Object Detection 2020 ResNeXt
multiscale
DCN
31 LSNet (Res2Net-101+ DCN, multi-scale) 53.5 59.2 Location-Sensitive Visual Recognition with Cross-IOU Loss 2021 multiscale
DCN
32 ResNeSt-200 (multi-scale) 53.3 58.0 ResNeSt: Split-Attention Networks 2020 multiscale
33 Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) 53.3 58.5 CBNet: A Novel Composite Backbone Network Architecture for Object Detection 2019 multiscale
34 DetectoRS (ResNeXt-101-32x4d, single-scale) 53.3 58.5 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
single scale
35 GFLV2 (Res2Net-101, DCN, multiscale) 53.3 59.2 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 multiscale
DCN
36 RelationNet++ (ResNeXt-64x4d-101-DCN) 52.7 RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder 2020 ResNeXt
DCN
37 YOLOv4-P5 with TTA 52.5 58 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
38 Deformable DETR (ResNeXt-101+DCN) 52.3 58.1 Deformable DETR: Deformable Transformers for End-to-End Object Detection 2020 ResNeXt
DCN
39 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 52.3 56.9 Global Context Networks 2020 ResNeXt
DCN
GCN
40 RetinaNet (SpineNet-190, 1280x1280) 52.1 56.5 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
41 RepPoints v2 (ResNeXt-101, DCN, multi-scale) 52.1 57.5 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt;
multiscale
DCN
42 AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) 51.9 57 Attention-guided Context Feature Pyramid Network for Object Detection 2020 ResNeXt
multiscale
FPN
43 OTA (ResNeXt-101+DCN, multiscale) 51.5 57.1 OTA: Optimal Transport Assignment for Object Detection 2021
44 UniverseNet-20.08d (Res2Net-101, DCN, single-scale) 51.3 55.8 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
45 TSD (SENet154-DCN,multi-scale) 51.2 56.0 Revisiting the Sibling Head in Object Detector 2020 multiscale
DCN
46 YOLOX-X (Modified CSP v5) 51.2 55.7 YOLOX: Exceeding YOLO Series in 2021 2021 YOLO
47 RetinaNet (SpineNet-143, 1280x1280) 50.7 54.9 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
48 ATSS (ResNetXt-64x4d-101+DCN,multi-scale) 50.7 56.3 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection 2019 ResNeXt
multiscale
DCN
49 NAS-FPN (AmoebaNet-D, learned aug) 50.7 Learning Data Augmentation Strategies for Object Detection 2019 FPN
50 GFLV2 (Res2Net-101, DCN) 50.6 55.3 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
51 aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) 50.2 53.9 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
multiscale
DCN
52 FreeAnchor + SEPC (DCN, ResNext-101-64x4d) 50.1 54.3 Scale-Equalizing Pyramid Convolution for Object Detection 2020 ResNeXt
DCN
53 D2Det (ResNet-101-DCN, multi-scale test) 50.1 54.9 D2Det: Towards High Quality Object Detection and Instance Segmentation 2020 multiscale
DCN
ResNet
54 Dynamic R-CNN (ResNet-101-DCN, multi-scale) 50.1 55.6 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training 2020 multiscale
DCN
ResNet
55 TSD (ResNet-101-Deformable, Image Pyramid) 49.4 54.4 Revisiting the Sibling Head in Object Detector 2020 ResNet
56 RepPoints v2 (ResNeXt-101, DCN) 49.4 53.4 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt
DCN
57 CPNDet (Hourglass-104, multi-scale) 49.2 53.7 Corner Proposal Network for Anchor-free, Two-stage Object Detection 2020 multiscale
58 GFLV2 (ResNeXt-101, 32x4d, DCN) 49 53.5 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNeXt
DCN
59 aLRP Loss (ResNext-101-64x4d, DCN, single scale) 48.9 52.5 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
DCN
60 UniverseNet-20.08 (Res2Net-50, DCN, single-scale) 48.8 53.0 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
61 SOLQ (ResNet101, single scale) 48.7 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
62 RetinaNet (SpineNet-96, 1024x1024) 48.6 52.5 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
63 TridentNet (ResNet-101-Deformable, Image Pyramid) 48.4 53.5 Scale-Aware Trident Networks for Object Detection 2019 ResNet
64 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 48.4 52.7 GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond 2019 ResNeXt
DCN
GCN
65 GFLV2 (ResNet-101-DCN) 48.3 52.8 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
ResNet
66 GFL (X-101-32x4d-DCN, single-scale) 48.2 52.6 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection 2020 ResNeXt
single scale
DCN
67 ISTR (ResNet101-FPN-3x, single-scale) 48.1 ISTR: End-to-End Instance Segmentation with Transformers 2021
68 aLRP Loss (ResNext-101-64x4d, single scale) 47.8 51.1 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
69 MatrixNet Corners (ResNet-152, multi-scale) 47.8 52.3 Matrix Nets: A New Deep Architecture for Object Detection 2019 multiscale
ResNet
70 SOLQ (ResNet50, single scale) 47.8 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
71 SAPD (ResNeXt-101, single-scale) 47.4 51.1 Soft Anchor-Point Object Detection 2019 ResNeXt
single scale
72 PANet (ResNeXt-101, multi-scale) 47.4 51.8 Path Aggregation Network for Instance Segmentation 2018 ResNeXt
multiscale
73 HTC (HRNetV2p-W48) 47.3 51.2 Deep High-Resolution Representation Learning for Visual Recognition 2019
74 HTC (ResNeXt-101-FPN) 47.1 44.7 Hybrid Task Cascade for Instance Segmentation 2019 ResNeXt
FPN
75 CenterNet511 (Hourglass-104, multi-scale) 47.0 50.7 CenterNet: Keypoint Triplets for Object Detection 2019 multiscale
76 MAL (ResNeXt101, multi-scale) 47.0 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
multiscale
77 ISTR (ResNet50-FPN-3x) 46.8 ISTR: End-to-End Instance Segmentation with Transformers 2021 FPN
ResNet
78 RetinaNet (SpineNet-49, 896x896) 46.7 50.6 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
79 RPDet (ResNet-101-DCN, multi-scale) 46.5 50.9 RepPoints: Point Set Representation for Object Detection 2019 multiscale
DCN
ResNet
80 HoughNet (MS) 46.4 50.7 HoughNet: Integrating near and long-range evidence for bottom-up object detection 2020 multiscale
81 PPDet (ResNeXt-101-FPN, multiscale) 46.3 51.6 Reducing Label Noise in Anchor-Free Object Detection 2020 ResNeXt
multiscale
FPN
82 GFLV2 (ResNet-101) 46.2 50.5 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
83 SNIPER (ResNet-101) 46.1 51.6 SNIPER: Efficient Multi-Scale Training 2018 ResNet
84 Mask R-CNN (HRNetV2p-W48 + cascade) 46.1 50.3 Deep High-Resolution Representation Learning for Visual Recognition 2019
85 DCNv2 (ResNet-101, multi-scale) 46.0 50.8 Deformable ConvNets v2: More Deformable, Better Results 2018 multiscale
DCN
ResNet
86 Gaussian-FCOS 46 Localization Uncertainty Estimation for Anchor-Free Object Detection 2020
87 Cascade R-CNN-FPN (ResNet-101, map-guided) 45.9 50 InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting 2019 FPN
ResNet
88 MAL (ResNeXt101, single-scale) 45.9 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
single scale
89 CenterMask+VoVNetV2-99 (single-scale) 45.8 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
90 D-RFCN + SNIP (DPN-98 with flip, multi-scale) 45.7 51.1 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
91 YOLOv4 (CD53) 45.5 49.5 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
92 PP-YOLO (608x608) 45.2 49.9 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
93 AC-FPN Cascade R-CNN (ResNet-101, single scale) 45 49 Attention-guided Context Feature Pyramid Network for Object Detection 2019 single scale
FPN
ResNet
94 FreeAnchor (ResNeXt-101) 44.8 48.4 FreeAnchor: Learning to Match Anchors for Visual Object Detection 2019 ResNeXt
95 FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) 44.7 48.4 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
96 CenterMask+VoVNet2-57 (single-scale) 44.7 48.6 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
97 FSAF (ResNeXt-101, multi-scale) 44.6 48.6 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 ResNeXt
multiscale
98 aLRP Loss (ResNext-101, DCN, 500 scale) 44.6 47.5 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
DCN
99 CenterMask + X-101-32x8d (single-scale) 44.6 48.4 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
100 RetinaNet (SpineNet-49, 640x640) 44.3 47.6 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
101 YOLOF-DC5 44.3 47.5 You Only Look One-level Feature 2021 YOLO
102 GFLV2 (ResNet-50) 44.3 48.5 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
103 InterNet (ResNet-101-FPN, multi-scale) 44.2 51.1 Feature Intertwiner for Object Detection 2019 multiscale
FPN
ResNet
104 M2Det (VGG-16, multi-scale) 44.2 49.3 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
105 Faster R-CNN (LIP-ResNet-101-MD w FPN) 43.9 48.1 LIP: Local Importance-based Pooling 2019 FPN
106 M2Det (ResNet-101, multi-scale) 43.9 48 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
ResNet
107 YOLOv3 @800 + ASFF* (Darknet-53) 43.9 49.2 Learning Spatial Fusion for Single-Shot Object Detection 2019 YOLO
108 FoveaBox (ResNeXt-101) 43.9 47.7 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
109 ExtremeNet (Hourglass-104, multi-scale) 43.7 47.0 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 multiscale
110 YOLOv4-608 43.5 47.3 YOLOv4: Optimal Speed and Accuracy of Object Detection 2020 single scale
YOLO
111 SNIPER (ResNet-50) 43.5 48.6 SNIPER: Efficient Multi-Scale Training 2018 ResNet
112 CenterNet (HRNetV2-W48) 43.5 46.5 Deep High-Resolution Representation Learning for Visual Recognition 2019
113 D-RFCN + SNIP (ResNet-101, multi-scale) 43.4 48.4 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
ResNet
114 Grid R-CNN (ResNeXt-101-FPN) 43.2 46.6 Grid R-CNN 2018 ResNeXt
FPN
115 FCOS (ResNeXt-101-64x4d-FPN) 43.2 46.6 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
116 CornerNet-Saccade (Hourglass-104, multi-scale) 43.2 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019 multiscale
117 Libra R-CNN (ResNeXt-101-FPN) 43.0 47 Libra R-CNN: Towards Balanced Learning for Object Detection 2019 ResNeXt
FPN
118 RPDet (ResNet-101-DCN) 42.8 46.3 RepPoints: Point Set Representation for Object Detection 2019 DCN
ResNet
119 SpineNet-49 (640, RetinaNet, single-scale) 42.8 46.1 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019 single scale
120 Cascade R-CNN (ResNet-101-FPN+, cascade) 42.8 46.3 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
121 Cascade R-CNN 42.8 46.3 Cascade R-CNN: High Quality Object Detection and Instance Segmentation 2019
122 TridentNet (ResNet-101) 42.7 46.5 Scale-Aware Trident Networks for Object Detection 2019 ResNet
123 FCOS (ResNeXt-32x8d-101-FPN) 42.7 46.1 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
124 RetinaMask (ResNeXt-101-FPN-GN) 42.6 46.0 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 ResNeXt
FPN
125 TAL + TAP 42.5 46.4 TOOD: Task-aligned One-stage Object Detection 2021
126 Faster R-CNN (HRNetV2p-W48) 42.4 46.4 Deep High-Resolution Representation Learning for Visual Recognition 2019
127 HSD (Rest101, 768x768, single-scale test) 42.3 46.9 Hierarchical Shot Detector 2019 single scale
128 CornerNet511 (Hourglass-104, multi-scale) 42.1 45.3 CornerNet: Detecting Objects as Paired Keypoints 2018 multiscale
129 FoveaBox (ResNeXt-101) 42.1 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
130 FCOS (HRNet-W32-5l) 42.0 45.3 FCOS: Fully Convolutional One-Stage Object Detection 2019
131 RefineDet512+ (ResNet-101) 41.8 45.7 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
132 GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) 41.6 44.2 Gradient Harmonized Single-stage Detector 2018 FPN
133 CenterNet-DLA (DLA-34, multi-scale) 41.6 Objects as Points 2019 multiscale
134 RetinaNet (SpineNet-49S, 640x640) 41.5 44.6 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
135 RPDet (ResNet-101) 41 44.3 RepPoints: Point Set Representation for Object Detection 2019 ResNet
136 M2Det (VGG-16, single-scale) 41.0 45 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
137 FSAF (ResNet-101, single-scale) 40.9 44 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 single scale
ResNet
138 RetinaNet (ResNeXt-101-FPN) 40.8 44.1 Focal Loss for Dense Object Detection 2017 ResNeXt
FPN
139 Cascade R-CNN (ResNet-50-FPN+, cascade) 40.6 44 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
140 Faster R-CNN (Cascade RPN) 40.6 44.5 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
141 ResNet-50-DW-DPN (Deformable Kernels) 40.6 Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation 2019 ResNet
142 IoU-Net 40.6 Acquisition of Localization Confidence for Accurate Object Detection 2018
143 FCOS (HRNetV2p-W48) 40.5 Deep High-Resolution Representation Learning for Visual Recognition 2019
144 ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS 40.4 Bounding Box Regression with Uncertainty for Accurate Object Detection 2018 FPN
ResNet
145 RDSNet (ResNet-101, RetinaNet, mask, MBRM) 40.3 43 RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation 2019 ResNet
146 ExtremeNet (Hourglass-104, single-scale) 40.2 43.2 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 single scale
147 Mask R-CNN (ResNet-101-FPN, CBN) 40.1 44.1 Cross-Iteration Batch Normalization 2020 FPN
ResNet
148 Fast R-CNN (Cascade RPN) 40.1 43.8 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
149 Mask R-CNN (ResNeXt-101-FPN) 39.8 43.4 Mask R-CNN 2017 ResNeXt
FPN
150 GA-Faster-RCNN 39.8 43.5 Region Proposal by Guided Anchoring 2019
151 FPN (ResNet101 backbone) 39.5 ChainerCV: a Library for Deep Learning in Computer Vision 2017 FPN
ResNet
152 RetinaMask (ResNet-50-FPN) 39.4 42.3 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 FPN
ResNet
153 PP-YOLO (320x320) 39.3 42.7 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
154 AA-ResNet-10 + RetinaNet 39.2 Attention Augmented Convolutional Networks 2019
155 MAL (ResNet50, single-scale) 39.2 Multiple Anchor Learning for Visual Object Detection 2019 single scale
ResNet
156 RetinaNet (ResNet-101-FPN) 39.1 42.3 Focal Loss for Dense Object Detection 2017 FPN
ResNet
157 Cascade R-CNN (ResNet-101-FPN+) 38.8 41.9 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
158 M2Det (ResNet-101, single-scale) 38.8 41.7 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
ResNet
159 SaccadeNet (DLA-34-DCN) 38.5 41.4 SaccadeNet: A Fast and Accurate Object Detector 2020 DCN
160 Mask R-CNN (ResNet-101-FPN) 38.2 41.7 Mask R-CNN 2017 FPN
ResNet
161 WSMA-Seg 38.1 Segmentation is All You Need 2019
162 Faster R-CNN + FPN + CGD 37.9 Compact Global Descriptor for Neural Networks 2019 FPN
163 CornerNet511 (Hourglass-52, single-scale) 37.8 40.1 CornerNet: Detecting Objects as Paired Keypoints 2018 single scale
164 RefineDet512+ (VGG-16) 37.6 40.8 Single-Shot Refinement Neural Network for Object Detection 2017
165 DeformConv-R-FCN (Aligned-Inception-ResNet) 37.5 Deformable Convolutional Networks 2017
166 Faster R-CNN (ImageNet+300M) 37.4 40.1 Revisiting Unreasonable Effectiveness of Data in Deep Learning Era 2017
167 Mask R-CNN (Bottleneck-injected ResNet-50, FPN) 36.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN
!!ResNet
168 Faster R-CNN + TDM 36.8 Beyond Skip Connections: Top-Down Modulation for Object Detection 2016
169 Cascade R-CNN (ResNet-50-FPN+) 36.5 39.2 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN;
ResNet
170 RefineDet512 (ResNet-101) 36.4 39.5 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
171 Faster R-CNN + FPN 36.2 Feature Pyramid Networks for Object Detection 2016 FPN
172 Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) 35.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN;
ResNet
173 Faster R-CNN (box refinement, context, multi-scale testing) 34.9 Deep Residual Learning for Image Recognition 2015 multiscale
174 Faster R-CNN 34.7 Speed/accuracy trade-offs for modern convolutional object detectors 2016
175 CornerNet-Squeeze 34.4 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019
176 MultiPath Network 33.2 A MultiPath Network for Object Detection 2016
177 ION 33.1 34.6 Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks 2015
178 RefineDet512 (VGG-16) 33 35.5 Single-Shot Refinement Neural Network for Object Detection 2017
179 YOLOv3 + Darknet-53 33.0 YOLOv3: An Incremental Improvement 2018 YOLO
180 SSD512 28.8 30.3 SSD: Single Shot MultiBox Detector 2015
181 MnasFPN (MobileNetV2) 26.1 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
182 ESPNetv2-512 26.0 ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network 2018
183 MnasFPN (MobileNetV3) 25.5 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
184 MnasFPN (MNASNet-B1) 24.6 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
185 MnasFPN x0.7 (MobileNetV2) 23.8 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
186 MobielNet-v1-SSD-300x300+CGD 21.4 Compact Global Descriptor for Neural Networks 2019
187 Fast-RCNN 19.7 Fast R-CNN 2015
188 MobileNet 19.3 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 2017
189 DAT-S (RetinaNet) 51.2 Vision Transformer with Deformable Attention 2022
190 CenterMask-VoVNet99 (multi-scale) 53.2 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 multiscale
191 Mask R-CNN (HRNetV2p-W32 + cascade) 48.6 Deep High-Resolution Representation Learning for Visual Recognition 2019
192 FoveaBox (ResNeXt-101) 45.2 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
193 VirTex Mask R-CNN (ResNet-50-FPN) 44.8 VirTex: Learning Visual Representations from Textual Annotations 2020 FPN;
ResNet
194 Centermask + ResNet101 46.9 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 ResNet
195 PAFNet (ResNet50-vd) 45.3 PAFNet: An Efficient Anchor-Free Object Detector Guidance 2021 ResNet
196 IoU-Net+EnergyRegression 41.8 Energy-Based Models for Deep Probabilistic Regression 2019
197 Cascade R-CNN (HRNetV2p-W48) 48.6 Deep High-Resolution Representation Learning for Visual Recognition 2019
198 ISTR (ResNet50-FPN-3x, single-scale) ISTR: End-to-End Instance Segmentation with Transformers 2021
199 FoveaBox (ResNeXt-101) FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
200 EfficientDet-D7x (single-scale) EfficientDet: Scalable and Efficient Object Detection 2019 single scale

APS

Rank Model box AP APS Paper Code Result Year Tags
1 SwinV2-G (HTC++) 63.1 Swin Transformer V2: Scaling Up Capacity and Resolution Link 2021 Swin-Transformer
2 Florence-CoSwin-H 62.4 Florence: A New Foundation Model for Computer Vision 2021 Swin-Transformer
3 GLIP (Swin-L, multi-scale) 61.5 45.3 Grounded Language-Image Pre-training 2021 multiscale;
Vision Language;
Dynamic Head;
BERT-Base
4 Soft Teacher + Swin-L (HTC++, multi-scale) 61.3 End-to-End Semi-Supervised Object Detection with Soft Teacher 2021 multiscale;
Swin-Transformer
5 DyHead (Swin-L, multi scale, self-training) 60.6 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale;
Swin-Transformer
6 Dual-Swin-L (HTC, multi-scale) 60.1 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 multiscale
Swin-Transformer
7 Dual-Swin-L (HTC, single-scale) 59.4 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 Swin-Transformer
8 Focal-L (DyHead, multi-scale) 58.9 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
Focal-Transformer
9 DyHead (Swin-L, multi scale) 58.7 41.7 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale
Swin-Transformer
10 Swin-L (HTC++, multi scale) 58.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 multiscale
Swin-Transformer
11 Focal-L (HTC++, multi-scale) 58.4 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
12 Swin-L (HTC++, single scale) 57.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 single scale
Swin-Transformer
13 YOLOR-D6 (1280, single-scale, 34 fps) 57.3 40.4 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
14 SOLQ (Swin-L, single) 56.5 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
15 YOLOR-E6 (1280, single-scale, 45 fps) 56.4 39.1 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
16 CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) 56.4 38.7 Probabilistic two-stage detection 2021 single scale
FPN
DCN
17 QueryInst (single-scale) 56.1 37.4 Instances as Queries 2021
18 YOLOv4-P7 with TTA 55.8 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
19 DetectoRS (ResNeXt-101-64x4d, multi-scale) 55.7 37.7 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
20 YOLOR-W6 (1280, single-scale, 66 fps) 55.5 37.6 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
21 YOLOv4-P7 CSP-P7 (single-scale, 16 fps) 55.4 38.1 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
22 CSP-p6 + Mish (multi-scale) 55.2 37.6 Mish: A Self Regularized Non-Monotonic Activation Function 2019 multiscale
23 YOLOv4-P6 with TTA 54.9 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
24 Cascade Eff-B7 NAS-FPN (1280) 54.8 Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation 2020 single scale
NAS-FPN
25 DetectoRS (ResNeXt-101-32x4d, multi-scale) 54.7 37.4 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
26 YOLOv4-P6 CSP-P6 (single-scale, 32 fps) 54.3 36.6 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
27 SpineNet-190 (1280, with Self-training on OpenImages, single-scale) 54.3 Rethinking Pre-training and Self-training 2020 single scale
28 UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) 54.1 35.8 USB: Universal-Scale Object Detection Benchmark 2021 multiscale
DCN
29 EfficientDet-D7 (single-scale) 53.7 EfficientDet: Scalable and Efficient Object Detection 2019 single scale
30 PAA (ResNext-152-32x8d + DCN, multi-scale) 53.5 36.0 Probabilistic Anchor Assignment with IoU Prediction for Object Detection 2020 ResNeXt
multiscale
DCN
31 LSNet (Res2Net-101+ DCN, multi-scale) 53.5 35.2 Location-Sensitive Visual Recognition with Cross-IOU Loss 2021 multiscale
DCN
32 ResNeSt-200 (multi-scale) 53.3 35.1 ResNeSt: Split-Attention Networks 2020 multiscale
33 Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) 53.3 35.5 CBNet: A Novel Composite Backbone Network Architecture for Object Detection 2019 multiscale
34 DetectoRS (ResNeXt-101-32x4d, single-scale) 53.3 33.9 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
single scale
35 GFLV2 (Res2Net-101, DCN, multiscale) 53.3 35.7 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 multiscale
DCN
36 RelationNet++ (ResNeXt-64x4d-101-DCN) 52.7 RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder 2020 ResNeXt
DCN
37 YOLOv4-P5 with TTA 52.5 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
38 Deformable DETR (ResNeXt-101+DCN) 52.3 34.4 Deformable DETR: Deformable Transformers for End-to-End Object Detection 2020 ResNeXt
DCN
39 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 52.3 Global Context Networks 2020 ResNeXt
DCN
GCN
40 RetinaNet (SpineNet-190, 1280x1280) 52.1 35.4 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
41 RepPoints v2 (ResNeXt-101, DCN, multi-scale) 52.1 34.5 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt;
multiscale
DCN
42 AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) 51.9 34.2 Attention-guided Context Feature Pyramid Network for Object Detection 2020 ResNeXt
multiscale
FPN
43 OTA (ResNeXt-101+DCN, multiscale) 51.5 34.1 OTA: Optimal Transport Assignment for Object Detection 2021
44 UniverseNet-20.08d (Res2Net-101, DCN, single-scale) 51.3 31.7 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
45 TSD (SENet154-DCN,multi-scale) 51.2 33.8 Revisiting the Sibling Head in Object Detector 2020 multiscale
DCN
46 YOLOX-X (Modified CSP v5) 51.2 31.2 YOLOX: Exceeding YOLO Series in 2021 2021 YOLO
47 RetinaNet (SpineNet-143, 1280x1280) 50.7 33.6 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
48 ATSS (ResNetXt-64x4d-101+DCN,multi-scale) 50.7 33.2 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection 2019 ResNeXt
multiscale
DCN
49 NAS-FPN (AmoebaNet-D, learned aug) 50.7 34.2 Learning Data Augmentation Strategies for Object Detection 2019 FPN
50 GFLV2 (Res2Net-101, DCN) 50.6 31.3 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
51 aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) 50.2 32.0 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
multiscale
DCN
52 FreeAnchor + SEPC (DCN, ResNext-101-64x4d) 50.1 31.3 Scale-Equalizing Pyramid Convolution for Object Detection 2020 ResNeXt
DCN
53 D2Det (ResNet-101-DCN, multi-scale test) 50.1 32.7 D2Det: Towards High Quality Object Detection and Instance Segmentation 2020 multiscale
DCN
ResNet
54 Dynamic R-CNN (ResNet-101-DCN, multi-scale) 50.1 32.8 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training 2020 multiscale
DCN
ResNet
55 TSD (ResNet-101-Deformable, Image Pyramid) 49.4 32.7 Revisiting the Sibling Head in Object Detector 2020 ResNet
56 RepPoints v2 (ResNeXt-101, DCN) 49.4 30.3 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt
DCN
57 CPNDet (Hourglass-104, multi-scale) 49.2 31.0 Corner Proposal Network for Anchor-free, Two-stage Object Detection 2020 multiscale
58 GFLV2 (ResNeXt-101, 32x4d, DCN) 49 29.7 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNeXt
DCN
59 aLRP Loss (ResNext-101-64x4d, DCN, single scale) 48.9 30.8 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
DCN
60 UniverseNet-20.08 (Res2Net-50, DCN, single-scale) 48.8 30.1 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
61 SOLQ (ResNet101, single scale) 48.7 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
62 RetinaNet (SpineNet-96, 1024x1024) 48.6 32 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
63 TridentNet (ResNet-101-Deformable, Image Pyramid) 48.4 31.8 Scale-Aware Trident Networks for Object Detection 2019 ResNet
64 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 48.4 GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond 2019 ResNeXt
DCN
GCN
65 GFLV2 (ResNet-101-DCN) 48.3 28.8 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
ResNet
66 GFL (X-101-32x4d-DCN, single-scale) 48.2 29.2 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection 2020 ResNeXt
single scale
DCN
67 ISTR (ResNet101-FPN-3x, single-scale) 48.1 28.7 ISTR: End-to-End Instance Segmentation with Transformers 2021
68 aLRP Loss (ResNext-101-64x4d, single scale) 47.8 30.2 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
69 MatrixNet Corners (ResNet-152, multi-scale) 47.8 29.7 Matrix Nets: A New Deep Architecture for Object Detection 2019 multiscale
ResNet
70 SOLQ (ResNet50, single scale) 47.8 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
71 SAPD (ResNeXt-101, single-scale) 47.4 28.1 Soft Anchor-Point Object Detection 2019 ResNeXt
single scale
72 PANet (ResNeXt-101, multi-scale) 47.4 30.1 Path Aggregation Network for Instance Segmentation 2018 ResNeXt
multiscale
73 HTC (HRNetV2p-W48) 47.3 28.0 Deep High-Resolution Representation Learning for Visual Recognition 2019
74 HTC (ResNeXt-101-FPN) 47.1 22.8 Hybrid Task Cascade for Instance Segmentation 2019 ResNeXt
FPN
75 CenterNet511 (Hourglass-104, multi-scale) 47.0 28.9 CenterNet: Keypoint Triplets for Object Detection 2019 multiscale
76 MAL (ResNeXt101, multi-scale) 47.0 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
multiscale
77 ISTR (ResNet50-FPN-3x) 46.8 ISTR: End-to-End Instance Segmentation with Transformers 2021 FPN
ResNet
78 RetinaNet (SpineNet-49, 896x896) 46.7 29.1 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
79 RPDet (ResNet-101-DCN, multi-scale) 46.5 30.3 RepPoints: Point Set Representation for Object Detection 2019 multiscale
DCN
ResNet
80 HoughNet (MS) 46.4 29.1 HoughNet: Integrating near and long-range evidence for bottom-up object detection 2020 multiscale
81 PPDet (ResNeXt-101-FPN, multiscale) 46.3 31.4 Reducing Label Noise in Anchor-Free Object Detection 2020 ResNeXt
multiscale
FPN
82 GFLV2 (ResNet-101) 46.2 27.8 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
83 SNIPER (ResNet-101) 46.1 29.6 SNIPER: Efficient Multi-Scale Training 2018 ResNet
84 Mask R-CNN (HRNetV2p-W48 + cascade) 46.1 27.1 Deep High-Resolution Representation Learning for Visual Recognition 2019
85 DCNv2 (ResNet-101, multi-scale) 46.0 27.8 Deformable ConvNets v2: More Deformable, Better Results 2018 multiscale
DCN
ResNet
86 Gaussian-FCOS 46 Localization Uncertainty Estimation for Anchor-Free Object Detection 2020
87 Cascade R-CNN-FPN (ResNet-101, map-guided) 45.9 26.3 InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting 2019 FPN
ResNet
88 MAL (ResNeXt101, single-scale) 45.9 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
single scale
89 CenterMask+VoVNetV2-99 (single-scale) 45.8 27.8 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
90 D-RFCN + SNIP (DPN-98 with flip, multi-scale) 45.7 29.3 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
91 YOLOv4 (CD53) 45.5 27 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
92 PP-YOLO (608x608) 45.2 26.3 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
93 AC-FPN Cascade R-CNN (ResNet-101, single scale) 45 26.9 Attention-guided Context Feature Pyramid Network for Object Detection 2019 single scale
FPN
ResNet
94 FreeAnchor (ResNeXt-101) 44.8 27 FreeAnchor: Learning to Match Anchors for Visual Object Detection 2019 ResNeXt
95 FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) 44.7 27.6 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
96 CenterMask+VoVNet2-57 (single-scale) 44.7 27.1 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
97 FSAF (ResNeXt-101, multi-scale) 44.6 29.7 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 ResNeXt
multiscale
98 aLRP Loss (ResNext-101, DCN, 500 scale) 44.6 24.6 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
DCN
99 CenterMask + X-101-32x8d (single-scale) 44.6 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
100 RetinaNet (SpineNet-49, 640x640) 44.3 25.9 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
101 YOLOF-DC5 44.3 24.0 You Only Look One-level Feature 2021 YOLO
102 GFLV2 (ResNet-50) 44.3 26.8 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
103 InterNet (ResNet-101-FPN, multi-scale) 44.2 27.2 Feature Intertwiner for Object Detection 2019 multiscale
FPN
ResNet
104 M2Det (VGG-16, multi-scale) 44.2 29.2 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
105 Faster R-CNN (LIP-ResNet-101-MD w FPN) 43.9 25.4 LIP: Local Importance-based Pooling 2019 FPN
106 M2Det (ResNet-101, multi-scale) 43.9 29.6 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
ResNet
107 YOLOv3 @800 + ASFF* (Darknet-53) 43.9 27.0 Learning Spatial Fusion for Single-Shot Object Detection 2019 YOLO
108 FoveaBox (ResNeXt-101) 43.9 26.8 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
109 ExtremeNet (Hourglass-104, multi-scale) 43.7 24.1 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 multiscale
110 YOLOv4-608 43.5 26.7 YOLOv4: Optimal Speed and Accuracy of Object Detection 2020 single scale
YOLO
111 SNIPER (ResNet-50) 43.5 26.1 SNIPER: Efficient Multi-Scale Training 2018 ResNet
112 CenterNet (HRNetV2-W48) 43.5 22.2 Deep High-Resolution Representation Learning for Visual Recognition 2019
113 D-RFCN + SNIP (ResNet-101, multi-scale) 43.4 27.2 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
ResNet
114 Grid R-CNN (ResNeXt-101-FPN) 43.2 25.1 Grid R-CNN 2018 ResNeXt
FPN
115 FCOS (ResNeXt-101-64x4d-FPN) 43.2 26.5 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
116 CornerNet-Saccade (Hourglass-104, multi-scale) 43.2 24.4 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019 multiscale
117 Libra R-CNN (ResNeXt-101-FPN) 43.0 25.3 Libra R-CNN: Towards Balanced Learning for Object Detection 2019 ResNeXt
FPN
118 RPDet (ResNet-101-DCN) 42.8 24.9 RepPoints: Point Set Representation for Object Detection 2019 DCN
ResNet
119 SpineNet-49 (640, RetinaNet, single-scale) 42.8 23.7 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019 single scale
120 Cascade R-CNN (ResNet-101-FPN+, cascade) 42.8 23.7 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
121 Cascade R-CNN 42.8 23.7 Cascade R-CNN: High Quality Object Detection and Instance Segmentation 2019
122 TridentNet (ResNet-101) 42.7 23.9 Scale-Aware Trident Networks for Object Detection 2019 ResNet
123 FCOS (ResNeXt-32x8d-101-FPN) 42.7 26.0 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
124 RetinaMask (ResNeXt-101-FPN-GN) 42.6 24.8 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 ResNeXt
FPN
125 TAL + TAP 42.5 TOOD: Task-aligned One-stage Object Detection 2021
126 Faster R-CNN (HRNetV2p-W48) 42.4 24.9 Deep High-Resolution Representation Learning for Visual Recognition 2019
127 HSD (Rest101, 768x768, single-scale test) 42.3 22.8 Hierarchical Shot Detector 2019 single scale
128 CornerNet511 (Hourglass-104, multi-scale) 42.1 20.8 CornerNet: Detecting Objects as Paired Keypoints 2018 multiscale
129 FoveaBox (ResNeXt-101) 42.1 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
130 FCOS (HRNet-W32-5l) 42.0 25.4 FCOS: Fully Convolutional One-Stage Object Detection 2019
131 RefineDet512+ (ResNet-101) 41.8 25.6 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
132 GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) 41.6 22.3 Gradient Harmonized Single-stage Detector 2018 FPN
133 CenterNet-DLA (DLA-34, multi-scale) 41.6 21.5 Objects as Points 2019 multiscale
134 RetinaNet (SpineNet-49S, 640x640) 41.5 23.3 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
135 RPDet (ResNet-101) 41 23.6 RepPoints: Point Set Representation for Object Detection 2019 ResNet
136 M2Det (VGG-16, single-scale) 41.0 22.1 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
137 FSAF (ResNet-101, single-scale) 40.9 24 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 single scale
ResNet
138 RetinaNet (ResNeXt-101-FPN) 40.8 24.1 Focal Loss for Dense Object Detection 2017 ResNeXt
FPN
139 Cascade R-CNN (ResNet-50-FPN+, cascade) 40.6 22.6 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
140 Faster R-CNN (Cascade RPN) 40.6 22.0 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
141 ResNet-50-DW-DPN (Deformable Kernels) 40.6 24.6 Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation 2019 ResNet
142 IoU-Net 40.6 Acquisition of Localization Confidence for Accurate Object Detection 2018
143 FCOS (HRNetV2p-W48) 40.5 23.4 Deep High-Resolution Representation Learning for Visual Recognition 2019
144 ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS 40.4 Bounding Box Regression with Uncertainty for Accurate Object Detection 2018 FPN
ResNet
145 RDSNet (ResNet-101, RetinaNet, mask, MBRM) 40.3 22.1 RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation 2019 ResNet
146 ExtremeNet (Hourglass-104, single-scale) 40.2 20.4 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 single scale
147 Mask R-CNN (ResNet-101-FPN, CBN) 40.1 35.8 Cross-Iteration Batch Normalization 2020 FPN
ResNet
148 Fast R-CNN (Cascade RPN) 40.1 22.1 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
149 Mask R-CNN (ResNeXt-101-FPN) 39.8 22.1 Mask R-CNN 2017 ResNeXt
FPN
150 GA-Faster-RCNN 39.8 21.8 Region Proposal by Guided Anchoring 2019
151 FPN (ResNet101 backbone) 39.5 ChainerCV: a Library for Deep Learning in Computer Vision 2017 FPN
ResNet
152 RetinaMask (ResNet-50-FPN) 39.4 21.9 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 FPN
ResNet
153 PP-YOLO (320x320) 39.3 16.7 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
154 AA-ResNet-10 + RetinaNet 39.2 Attention Augmented Convolutional Networks 2019
155 MAL (ResNet50, single-scale) 39.2 Multiple Anchor Learning for Visual Object Detection 2019 single scale
ResNet
156 RetinaNet (ResNet-101-FPN) 39.1 21.8 Focal Loss for Dense Object Detection 2017 FPN
ResNet
157 Cascade R-CNN (ResNet-101-FPN+) 38.8 21.3 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
158 M2Det (ResNet-101, single-scale) 38.8 20.5 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
ResNet
159 SaccadeNet (DLA-34-DCN) 38.5 19.2 SaccadeNet: A Fast and Accurate Object Detector 2020 DCN
160 Mask R-CNN (ResNet-101-FPN) 38.2 20.1 Mask R-CNN 2017 FPN
ResNet
161 WSMA-Seg 38.1 Segmentation is All You Need 2019
162 Faster R-CNN + FPN + CGD 37.9 Compact Global Descriptor for Neural Networks 2019 FPN
163 CornerNet511 (Hourglass-52, single-scale) 37.8 17.0 CornerNet: Detecting Objects as Paired Keypoints 2018 single scale
164 RefineDet512+ (VGG-16) 37.6 22.7 Single-Shot Refinement Neural Network for Object Detection 2017
165 DeformConv-R-FCN (Aligned-Inception-ResNet) 37.5 19.4 Deformable Convolutional Networks 2017
166 Faster R-CNN (ImageNet+300M) 37.4 17.5 Revisiting Unreasonable Effectiveness of Data in Deep Learning Era 2017
167 Mask R-CNN (Bottleneck-injected ResNet-50, FPN) 36.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN
!!ResNet
168 Faster R-CNN + TDM 36.8 Beyond Skip Connections: Top-Down Modulation for Object Detection 2016
169 Cascade R-CNN (ResNet-50-FPN+) 36.5 20.3 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN;
ResNet
170 RefineDet512 (ResNet-101) 36.4 16.6 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
171 Faster R-CNN + FPN 36.2 Feature Pyramid Networks for Object Detection 2016 FPN
172 Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) 35.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN;
ResNet
173 Faster R-CNN (box refinement, context, multi-scale testing) 34.9 Deep Residual Learning for Image Recognition 2015 multiscale
174 Faster R-CNN 34.7 Speed/accuracy trade-offs for modern convolutional object detectors 2016
175 CornerNet-Squeeze 34.4 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019
176 MultiPath Network 33.2 A MultiPath Network for Object Detection 2016
177 ION 33.1 14.5 Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks 2015
178 RefineDet512 (VGG-16) 33 16.3 Single-Shot Refinement Neural Network for Object Detection 2017
179 YOLOv3 + Darknet-53 33.0 YOLOv3: An Incremental Improvement 2018 YOLO
180 SSD512 28.8 SSD: Single Shot MultiBox Detector 2015
181 MnasFPN (MobileNetV2) 26.1 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
182 ESPNetv2-512 26.0 ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network 2018
183 MnasFPN (MobileNetV3) 25.5 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
184 MnasFPN (MNASNet-B1) 24.6 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
185 MnasFPN x0.7 (MobileNetV2) 23.8 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
186 MobielNet-v1-SSD-300x300+CGD 21.4 Compact Global Descriptor for Neural Networks 2019
187 Fast-RCNN 19.7 Fast R-CNN 2015
188 MobileNet 19.3 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 2017
189 DAT-S (RetinaNet) 32.3 Vision Transformer with Deformable Attention 2022
190 CenterMask-VoVNet99 (multi-scale) 32.4 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 multiscale
191 Mask R-CNN (HRNetV2p-W32 + cascade) Deep High-Resolution Representation Learning for Visual Recognition 2019
192 FoveaBox (ResNeXt-101) FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
193 VirTex Mask R-CNN (ResNet-50-FPN) VirTex: Learning Visual Representations from Textual Annotations 2020 FPN;
ResNet
194 Centermask + ResNet101 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 ResNet
195 PAFNet (ResNet50-vd) 22.8 PAFNet: An Efficient Anchor-Free Object Detector Guidance 2021 ResNet
196 IoU-Net+EnergyRegression Energy-Based Models for Deep Probabilistic Regression 2019
197 Cascade R-CNN (HRNetV2p-W48) 26.0 Deep High-Resolution Representation Learning for Visual Recognition 2019
198 ISTR (ResNet50-FPN-3x, single-scale) 27.8 ISTR: End-to-End Instance Segmentation with Transformers 2021
199 FoveaBox (ResNeXt-101) 24.9 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
200 EfficientDet-D7x (single-scale) EfficientDet: Scalable and Efficient Object Detection 2019 single scale

Rank Model box AP AP50 AP75 APS APM APL AP Extra Training Data Paper Code Result Year Tags
1 SwinV2-G (HTC++) 63.1 Swin Transformer V2: Scaling Up Capacity and Resolution Link 2021 Swin-Transformer
2 Florence-CoSwin-H 62.4 Florence: A New Foundation Model for Computer Vision 2021 Swin-Transformer
3 GLIP (Swin-L, multi-scale) 61.5 79.5 67.7 45.3 64.9 75.0 Grounded Language-Image Pre-training 2021 multiscale;
Vision Language;
Dynamic Head;
BERT-Base
4 Soft Teacher + Swin-L (HTC++, multi-scale) 61.3 End-to-End Semi-Supervised Object Detection with Soft Teacher 2021 multiscale;
Swin-Transformer
5 DyHead (Swin-L, multi scale, self-training) 60.6 78.5 66.6 64.0 74.2 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale;
Swin-Transformer
6 Dual-Swin-L (HTC, multi-scale) 60.1 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 multiscale
Swin-Transformer
7 Dual-Swin-L (HTC, single-scale) 59.4 CBNetV2: A Composite Backbone Network Architecture for Object Detection 2021 Swin-Transformer
8 Focal-L (DyHead, multi-scale) 58.9 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
Focal-Transformer
9 DyHead (Swin-L, multi scale) 58.7 77.1 64.5 41.7 62.0 72.8 Dynamic Head: Unifying Object Detection Heads with Attentions 2021 multiscale
Swin-Transformer
10 Swin-L (HTC++, multi scale) 58.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 multiscale
Swin-Transformer
11 Focal-L (HTC++, multi-scale) 58.4 Focal Self-attention for Local-Global Interactions in Vision Transformers 2021 multiscale
12 Swin-L (HTC++, single scale) 57.7 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows 2021 single scale
Swin-Transformer
13 YOLOR-D6 (1280, single-scale, 34 fps) 57.3 75.0 62.7 40.4 61.2 69.2 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
14 SOLQ (Swin-L, single) 56.5 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
15 YOLOR-E6 (1280, single-scale, 45 fps) 56.4 74.1 61.6 39.1 60.1 68.2 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
16 CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale) 56.4 74.0 61.6 38.7 59.7 68.6 Probabilistic two-stage detection 2021 single scale
FPN
DCN
17 QueryInst (single-scale) 56.1 75.9 61.9 37.4 58.9 70.3 Instances as Queries 2021
18 YOLOv4-P7 with TTA 55.8 73.2 61.2 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
19 DetectoRS (ResNeXt-101-64x4d, multi-scale) 55.7 74.2 61.1 37.7 58.4 68.1 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
20 YOLOR-W6 (1280, single-scale, 66 fps) 55.5 73.2 60.6 37.6 59.5 67.7 You Only Learn One Representation: Unified Network for Multiple Tasks 2021 single scale
YOLO
21 YOLOv4-P7 CSP-P7 (single-scale, 16 fps) 55.4 73.3 60.7 38.1 59.5 67.4 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
22 CSP-p6 + Mish (multi-scale) 55.2 72.9 60.5 37.6 59.0 66.9 Mish: A Self Regularized Non-Monotonic Activation Function 2019 multiscale
23 YOLOv4-P6 with TTA 54.9 72.6 60.2 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
24 Cascade Eff-B7 NAS-FPN (1280) 54.8 Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation 2020 single scale
NAS-FPN
25 DetectoRS (ResNeXt-101-32x4d, multi-scale) 54.7 73.5 60.1 37.4 57.3 66.4 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
multiscale
26 YOLOv4-P6 CSP-P6 (single-scale, 32 fps) 54.3 72.3 59.5 36.6 58.2 65.5 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
27 SpineNet-190 (1280, with Self-training on OpenImages, single-scale) 54.3 Rethinking Pre-training and Self-training 2020 single scale
28 UniverseNet-20.08d (Res2Net-101, DCN, multi-scale) 54.1 71.6 59.9 35.8 57.2 67.4 USB: Universal-Scale Object Detection Benchmark 2021 multiscale
DCN
29 EfficientDet-D7 (single-scale) 53.7 72.4 57.0 66.3 EfficientDet: Scalable and Efficient Object Detection 2019 single scale
30 PAA (ResNext-152-32x8d + DCN, multi-scale) 53.5 71.6 59.1 36.0 56.3 66.9 Probabilistic Anchor Assignment with IoU Prediction for Object Detection 2020 ResNeXt
multiscale
DCN
31 LSNet (Res2Net-101+ DCN, multi-scale) 53.5 71.1 59.2 35.2 56.4 65.8 Location-Sensitive Visual Recognition with Cross-IOU Loss 2021 multiscale
DCN
32 ResNeSt-200 (multi-scale) 53.3 72.0 58.0 35.1 56.2 66.8 ResNeSt: Split-Attention Networks 2020 multiscale
33 Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale) 53.3 71.9 58.5 35.5 55.8 66.7 CBNet: A Novel Composite Backbone Network Architecture for Object Detection 2019 multiscale
34 DetectoRS (ResNeXt-101-32x4d, single-scale) 53.3 71.6 58.5 33.9 56.5 66.9 DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution 2020 ResNeXt
single scale
35 GFLV2 (Res2Net-101, DCN, multiscale) 53.3 70.9 59.2 35.7 56.1 65.6 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 multiscale
DCN
36 RelationNet++ (ResNeXt-64x4d-101-DCN) 52.7 RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder 2020 ResNeXt
DCN
37 YOLOv4-P5 with TTA 52.5 70.3 58 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 multiscale
YOLO
38 Deformable DETR (ResNeXt-101+DCN) 52.3 71.9 58.1 34.4 54.4 65.6 Deformable DETR: Deformable Transformers for End-to-End Object Detection 2020 ResNeXt
DCN
39 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 52.3 70.9 56.9 Global Context Networks 2020 ResNeXt
DCN
GCN
40 RetinaNet (SpineNet-190, 1280x1280) 52.1 71.8 56.5 35.4 55 63.6 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
41 RepPoints v2 (ResNeXt-101, DCN, multi-scale) 52.1 70.1 57.5 34.5 54.6 63.6 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt;
multiscale
DCN
42 AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) 51.9 70.4 57 34.2 54.8 64.7 Attention-guided Context Feature Pyramid Network for Object Detection 2020 ResNeXt
multiscale
FPN
43 OTA (ResNeXt-101+DCN, multiscale) 51.5 68.6 57.1 34.1 53.7 64.1 OTA: Optimal Transport Assignment for Object Detection 2021
44 UniverseNet-20.08d (Res2Net-101, DCN, single-scale) 51.3 70.0 55.8 31.7 55.3 64.9 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
45 TSD (SENet154-DCN,multi-scale) 51.2 71.9 56.0 33.8 54.8 64.2 Revisiting the Sibling Head in Object Detector 2020 multiscale
DCN
46 YOLOX-X (Modified CSP v5) 51.2 69.6 55.7 31.2 56.1 66.1 YOLOX: Exceeding YOLO Series in 2021 2021 YOLO
47 RetinaNet (SpineNet-143, 1280x1280) 50.7 70.4 54.9 33.6 53.9 62.1 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
48 ATSS (ResNetXt-64x4d-101+DCN,multi-scale) 50.7 68.9 56.3 33.2 52.9 62.4 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection 2019 ResNeXt
multiscale
DCN
49 NAS-FPN (AmoebaNet-D, learned aug) 50.7 34.2 55.5 64.5 Learning Data Augmentation Strategies for Object Detection 2019 FPN
50 GFLV2 (Res2Net-101, DCN) 50.6 69 55.3 31.3 54.3 63.5 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
51 aLRP Loss (ResNext-101-64x4d, DCN, multiscale test) 50.2 70.3 53.9 32.0 53.1 63.0 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
multiscale
DCN
52 FreeAnchor + SEPC (DCN, ResNext-101-64x4d) 50.1 69.8 54.3 31.3 53.3 63.7 Scale-Equalizing Pyramid Convolution for Object Detection 2020 ResNeXt
DCN
53 D2Det (ResNet-101-DCN, multi-scale test) 50.1 69.4 54.9 32.7 52.7 62.1 D2Det: Towards High Quality Object Detection and Instance Segmentation 2020 multiscale
DCN
ResNet
54 Dynamic R-CNN (ResNet-101-DCN, multi-scale) 50.1 68.3 55.6 32.8 53.0 61.2 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training 2020 multiscale
DCN
ResNet
55 TSD (ResNet-101-Deformable, Image Pyramid) 49.4 69.6 54.4 32.7 52.5 61.0 Revisiting the Sibling Head in Object Detector 2020 ResNet
56 RepPoints v2 (ResNeXt-101, DCN) 49.4 68.9 53.4 30.3 52.1 62.3 RepPoints V2: Verification Meets Regression for Object Detection 2020 ResNeXt
DCN
57 CPNDet (Hourglass-104, multi-scale) 49.2 67.3 53.7 31.0 51.9 62.4 Corner Proposal Network for Anchor-free, Two-stage Object Detection 2020 multiscale
58 GFLV2 (ResNeXt-101, 32x4d, DCN) 49 67.6 53.5 29.7 52.4 61.4 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNeXt
DCN
59 aLRP Loss (ResNext-101-64x4d, DCN, single scale) 48.9 69.3 52.5 30.8 51.5 62.1 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
DCN
60 UniverseNet-20.08 (Res2Net-50, DCN, single-scale) 48.8 67.5 53.0 30.1 52.3 61.1 USB: Universal-Scale Object Detection Benchmark 2021 single scale
DCN
61 SOLQ (ResNet101, single scale) 48.7 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
62 RetinaNet (SpineNet-96, 1024x1024) 48.6 68.4 52.5 32 52.3 62 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
63 TridentNet (ResNet-101-Deformable, Image Pyramid) 48.4 69.7 53.5 31.8 51.3 60.3 Scale-Aware Trident Networks for Object Detection 2019 ResNet
64 GCNet (ResNeXt-101 + DCN + cascade + GC r4) 48.4 67.6 52.7 GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond 2019 ResNeXt
DCN
GCN
65 GFLV2 (ResNet-101-DCN) 48.3 66.5 52.8 28.8 51.9 60.7 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 DCN
ResNet
66 GFL (X-101-32x4d-DCN, single-scale) 48.2 67.4 52.6 29.2 51.7 60.2 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection 2020 ResNeXt
single scale
DCN
67 ISTR (ResNet101-FPN-3x, single-scale) 48.1 28.7 50.4 61.5 ISTR: End-to-End Instance Segmentation with Transformers 2021
68 aLRP Loss (ResNext-101-64x4d, single scale) 47.8 68.4 51.1 30.2 50.8 59.1 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
single scale
69 MatrixNet Corners (ResNet-152, multi-scale) 47.8 66.2 52.3 29.7 50.4 60.7 Matrix Nets: A New Deep Architecture for Object Detection 2019 multiscale
ResNet
70 SOLQ (ResNet50, single scale) 47.8 SOLQ: Segmenting Objects by Learning Queries 2021 Transformer
single scale
71 SAPD (ResNeXt-101, single-scale) 47.4 67.4 51.1 28.1 50.3 61.5 Soft Anchor-Point Object Detection 2019 ResNeXt
single scale
72 PANet (ResNeXt-101, multi-scale) 47.4 67.2 51.8 30.1 51.7 60.0 Path Aggregation Network for Instance Segmentation 2018 ResNeXt
multiscale
73 HTC (HRNetV2p-W48) 47.3 65.9 51.2 28.0 49.7 59.8 Deep High-Resolution Representation Learning for Visual Recognition 2019
74 HTC (ResNeXt-101-FPN) 47.1 63.9 44.7 22.8 43.9 54.6 Hybrid Task Cascade for Instance Segmentation 2019 ResNeXt
FPN
75 CenterNet511 (Hourglass-104, multi-scale) 47.0 64.5 50.7 28.9 49.9 58.9 CenterNet: Keypoint Triplets for Object Detection 2019 multiscale
76 MAL (ResNeXt101, multi-scale) 47.0 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
multiscale
77 ISTR (ResNet50-FPN-3x) 46.8 ISTR: End-to-End Instance Segmentation with Transformers 2021 FPN
ResNet
78 RetinaNet (SpineNet-49, 896x896) 46.7 66.3 50.6 29.1 50.1 61.7 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
79 RPDet (ResNet-101-DCN, multi-scale) 46.5 67.4 50.9 30.3 49.7 57.1 RepPoints: Point Set Representation for Object Detection 2019 multiscale
DCN
ResNet
80 HoughNet (MS) 46.4 65.1 50.7 29.1 48.5 58.1 HoughNet: Integrating near and long-range evidence for bottom-up object detection 2020 multiscale
81 PPDet (ResNeXt-101-FPN, multiscale) 46.3 64.8 51.6 31.4 49.9 56.4 Reducing Label Noise in Anchor-Free Object Detection 2020 ResNeXt
multiscale
FPN
82 GFLV2 (ResNet-101) 46.2 64.3 50.5 27.8 49.9 57 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
83 SNIPER (ResNet-101) 46.1 67.0 51.6 29.6 48.9 58.1 SNIPER: Efficient Multi-Scale Training 2018 ResNet
84 Mask R-CNN (HRNetV2p-W48 + cascade) 46.1 64.0 50.3 27.1 48.6 58.3 Deep High-Resolution Representation Learning for Visual Recognition 2019
85 DCNv2 (ResNet-101, multi-scale) 46.0 67.9 50.8 27.8 49.1 59.5 Deformable ConvNets v2: More Deformable, Better Results 2018 multiscale
DCN
ResNet
86 Gaussian-FCOS 46 Localization Uncertainty Estimation for Anchor-Free Object Detection 2020
87 Cascade R-CNN-FPN (ResNet-101, map-guided) 45.9 64.2 50 26.3 49 58.6 InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting 2019 FPN
ResNet
88 MAL (ResNeXt101, single-scale) 45.9 Multiple Anchor Learning for Visual Object Detection 2019 ResNeXt
single scale
89 CenterMask+VoVNetV2-99 (single-scale) 45.8 64.5 27.8 48.3 57.6 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
90 D-RFCN + SNIP (DPN-98 with flip, multi-scale) 45.7 67.3 51.1 29.3 48.8 57.1 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
91 YOLOv4 (CD53) 45.5 64.1 49.5 27 49 56.7 Scaled-YOLOv4: Scaling Cross Stage Partial Network 2020 single scale
YOLO
92 PP-YOLO (608x608) 45.2 65.2 49.9 26.3 47.8 57.2 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
93 AC-FPN Cascade R-CNN (ResNet-101, single scale) 45 64.4 49 26.9 47.7 56.6 Attention-guided Context Feature Pyramid Network for Object Detection 2019 single scale
FPN
ResNet
94 FreeAnchor (ResNeXt-101) 44.8 64.3 48.4 27 47.9 56 FreeAnchor: Learning to Match Anchors for Visual Object Detection 2019 ResNeXt
95 FCOS (ResNeXt-64x4d-101-FPN 4 + improvements) 44.7 64.1 48.4 27.6 47.5 55.6 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
96 CenterMask+VoVNet2-57 (single-scale) 44.7 63.1 48.6 27.1 55.9 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
97 FSAF (ResNeXt-101, multi-scale) 44.6 65.2 48.6 29.7 47.1 54.6 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 ResNeXt
multiscale
98 aLRP Loss (ResNext-101, DCN, 500 scale) 44.6 65.0 47.5 24.6 48.1 58.3 A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection 2020 ResNeXt
DCN
99 CenterMask + X-101-32x8d (single-scale) 44.6 63.4 48.4 47.2 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 single scale
100 RetinaNet (SpineNet-49, 640x640) 44.3 63.8 47.6 25.9 47.7 61.1 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
101 YOLOF-DC5 44.3 62.9 47.5 24.0 48.5 60.4 You Only Look One-level Feature 2021 YOLO
102 GFLV2 (ResNet-50) 44.3 62.3 48.5 26.8 47.7 54.1 Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection 2020 ResNet
103 InterNet (ResNet-101-FPN, multi-scale) 44.2 67.5 51.1 27.2 50.3 57.7 Feature Intertwiner for Object Detection 2019 multiscale
FPN
ResNet
104 M2Det (VGG-16, multi-scale) 44.2 64.6 49.3 29.2 47.9 55.1 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
105 Faster R-CNN (LIP-ResNet-101-MD w FPN) 43.9 65.7 48.1 25.4 46.7 56.3 LIP: Local Importance-based Pooling 2019 FPN
106 M2Det (ResNet-101, multi-scale) 43.9 64.4 48 29.6 49.6 54.3 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 multiscale
ResNet
107 YOLOv3 @800 + ASFF* (Darknet-53) 43.9 64.1 49.2 27.0 46.6 53.4 Learning Spatial Fusion for Single-Shot Object Detection 2019 YOLO
108 FoveaBox (ResNeXt-101) 43.9 63.5 47.7 26.8 46.9 55.6 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
109 ExtremeNet (Hourglass-104, multi-scale) 43.7 60.5 47.0 24.1 46.9 57.6 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 multiscale
110 YOLOv4-608 43.5 65.7 47.3 26.7 46.7 53.3 YOLOv4: Optimal Speed and Accuracy of Object Detection 2020 single scale
YOLO
111 SNIPER (ResNet-50) 43.5 65.0 48.6 26.1 46.3 56.0 SNIPER: Efficient Multi-Scale Training 2018 ResNet
112 CenterNet (HRNetV2-W48) 43.5 46.5 22.2 57.8 Deep High-Resolution Representation Learning for Visual Recognition 2019
113 D-RFCN + SNIP (ResNet-101, multi-scale) 43.4 65.5 48.4 27.2 46.5 54.9 An Analysis of Scale Invariance in Object Detection - SNIP 2017 multiscale
ResNet
114 Grid R-CNN (ResNeXt-101-FPN) 43.2 63.0 46.6 25.1 46.5 55.2 Grid R-CNN 2018 ResNeXt
FPN
115 FCOS (ResNeXt-101-64x4d-FPN) 43.2 62.8 46.6 26.5 46.2 53.3 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
116 CornerNet-Saccade (Hourglass-104, multi-scale) 43.2 24.4 44.6 57.3 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019 multiscale
117 Libra R-CNN (ResNeXt-101-FPN) 43.0 64 47 25.3 45.6 54.6 Libra R-CNN: Towards Balanced Learning for Object Detection 2019 ResNeXt
FPN
118 RPDet (ResNet-101-DCN) 42.8 65.0 46.3 24.9 46.2 54.7 RepPoints: Point Set Representation for Object Detection 2019 DCN
ResNet
119 SpineNet-49 (640, RetinaNet, single-scale) 42.8 62.3 46.1 23.7 45.2 57.3 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019 single scale
120 Cascade R-CNN (ResNet-101-FPN+, cascade) 42.8 62.1 46.3 23.7 45.5 55.2 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
121 Cascade R-CNN 42.8 62.1 46.3 23.7 45.5 55.2 Cascade R-CNN: High Quality Object Detection and Instance Segmentation 2019
122 TridentNet (ResNet-101) 42.7 63.6 46.5 23.9 46.6 56.6 Scale-Aware Trident Networks for Object Detection 2019 ResNet
123 FCOS (ResNeXt-32x8d-101-FPN) 42.7 62.2 46.1 26.0 45.6 52.6 FCOS: Fully Convolutional One-Stage Object Detection 2019 ResNeXt
FPN
124 RetinaMask (ResNeXt-101-FPN-GN) 42.6 62.5 46.0 24.8 45.6 53.8 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 ResNeXt
FPN
125 TAL + TAP 42.5 60.3 46.4 TOOD: Task-aligned One-stage Object Detection 2021
126 Faster R-CNN (HRNetV2p-W48) 42.4 63.6 46.4 24.9 44.6 53.0 Deep High-Resolution Representation Learning for Visual Recognition 2019
127 HSD (Rest101, 768x768, single-scale test) 42.3 61.2 46.9 22.8 47.3 55.9 Hierarchical Shot Detector 2019 single scale
128 CornerNet511 (Hourglass-104, multi-scale) 42.1 57.8 45.3 20.8 44.8 56.7 CornerNet: Detecting Objects as Paired Keypoints 2018 multiscale
129 FoveaBox (ResNeXt-101) 42.1 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
130 FCOS (HRNet-W32-5l) 42.0 60.4 45.3 25.4 45.0 51.0 FCOS: Fully Convolutional One-Stage Object Detection 2019
131 RefineDet512+ (ResNet-101) 41.8 62.9 45.7 25.6 45.1 54.1 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
132 GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101) 41.6 62.8 44.2 22.3 45.1 55.3 Gradient Harmonized Single-stage Detector 2018 FPN
133 CenterNet-DLA (DLA-34, multi-scale) 41.6 21.5 43.9 56.0 Objects as Points 2019 multiscale
134 RetinaNet (SpineNet-49S, 640x640) 41.5 60.5 44.6 23.3 45 58 SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization 2019
135 RPDet (ResNet-101) 41 62.9 44.3 23.6 44.1 51.7 RepPoints: Point Set Representation for Object Detection 2019 ResNet
136 M2Det (VGG-16, single-scale) 41.0 59.7 45 22.1 46.5 53.8 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
137 FSAF (ResNet-101, single-scale) 40.9 61.5 44 24 44.2 51.3 Feature Selective Anchor-Free Module for Single-Shot Object Detection 2019 single scale
ResNet
138 RetinaNet (ResNeXt-101-FPN) 40.8 61.1 44.1 24.1 44.2 51.2 Focal Loss for Dense Object Detection 2017 ResNeXt
FPN
139 Cascade R-CNN (ResNet-50-FPN+, cascade) 40.6 59.9 44 22.6 42.7 52.1 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
140 Faster R-CNN (Cascade RPN) 40.6 58.9 44.5 22.0 42.8 52.6 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
141 ResNet-50-DW-DPN (Deformable Kernels) 40.6 24.6 43.9 53.3 Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation 2019 ResNet
142 IoU-Net 40.6 Acquisition of Localization Confidence for Accurate Object Detection 2018
143 FCOS (HRNetV2p-W48) 40.5 59.3 23.4 42.6 51.0 Deep High-Resolution Representation Learning for Visual Recognition 2019
144 ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS 40.4 Bounding Box Regression with Uncertainty for Accurate Object Detection 2018 FPN
ResNet
145 RDSNet (ResNet-101, RetinaNet, mask, MBRM) 40.3 60.1 43 22.1 43.5 51.5 RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation 2019 ResNet
146 ExtremeNet (Hourglass-104, single-scale) 40.2 55.5 43.2 20.4 43.2 53.1 Bottom-up Object Detection by Grouping Extreme and Center Points 2019 single scale
147 Mask R-CNN (ResNet-101-FPN, CBN) 40.1 60.5 44.1 35.8 57.3 38.5 Cross-Iteration Batch Normalization 2020 FPN
ResNet
148 Fast R-CNN (Cascade RPN) 40.1 59.4 43.8 22.1 42.4 51.6 Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution 2019
149 Mask R-CNN (ResNeXt-101-FPN) 39.8 62.3 43.4 22.1 43.2 51.2 Mask R-CNN 2017 ResNeXt
FPN
150 GA-Faster-RCNN 39.8 59.2 43.5 21.8 42.6 50.7 Region Proposal by Guided Anchoring 2019
151 FPN (ResNet101 backbone) 39.5 ChainerCV: a Library for Deep Learning in Computer Vision 2017 FPN
ResNet
152 RetinaMask (ResNet-50-FPN) 39.4 58.6 42.3 21.9 42.0 51.0 RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free 2019 FPN
ResNet
153 PP-YOLO (320x320) 39.3 59.3 42.7 16.7 41.4 57.8 PP-YOLO: An Effective and Efficient Implementation of Object Detector 2020 YOLO
154 AA-ResNet-10 + RetinaNet 39.2 Attention Augmented Convolutional Networks 2019
155 MAL (ResNet50, single-scale) 39.2 Multiple Anchor Learning for Visual Object Detection 2019 single scale
ResNet
156 RetinaNet (ResNet-101-FPN) 39.1 59.1 42.3 21.8 42.7 50.2 Focal Loss for Dense Object Detection 2017 FPN
ResNet
157 Cascade R-CNN (ResNet-101-FPN+) 38.8 61.1 41.9 21.3 41.8 49.8 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN
ResNet
158 M2Det (ResNet-101, single-scale) 38.8 59.4 41.7 20.5 43.9 53.4 M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network 2018 single scale
ResNet
159 SaccadeNet (DLA-34-DCN) 38.5 55.6 41.4 19.2 42.1 50.6 SaccadeNet: A Fast and Accurate Object Detector 2020 DCN
160 Mask R-CNN (ResNet-101-FPN) 38.2 60.3 41.7 20.1 41.1 50.2 Mask R-CNN 2017 FPN
ResNet
161 WSMA-Seg 38.1 Segmentation is All You Need 2019
162 Faster R-CNN + FPN + CGD 37.9 Compact Global Descriptor for Neural Networks 2019 FPN
163 CornerNet511 (Hourglass-52, single-scale) 37.8 53.7 40.1 17.0 39.0 50.5 CornerNet: Detecting Objects as Paired Keypoints 2018 single scale
164 RefineDet512+ (VGG-16) 37.6 58.7 40.8 22.7 40.3 48.3 Single-Shot Refinement Neural Network for Object Detection 2017
165 DeformConv-R-FCN (Aligned-Inception-ResNet) 37.5 58.0 19.4 40.1 52.5 Deformable Convolutional Networks 2017
166 Faster R-CNN (ImageNet+300M) 37.4 58 40.1 17.5 41.1 51.2 Revisiting Unreasonable Effectiveness of Data in Deep Learning Era 2017
167 Mask R-CNN (Bottleneck-injected ResNet-50, FPN) 36.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN
!!ResNet
168 Faster R-CNN + TDM 36.8 Beyond Skip Connections: Top-Down Modulation for Object Detection 2016
169 Cascade R-CNN (ResNet-50-FPN+) 36.5 59 39.2 20.3 38.8 46.4 Cascade R-CNN: Delving into High Quality Object Detection 2017 FPN;
ResNet
170 RefineDet512 (ResNet-101) 36.4 57.5 39.5 16.6 39.9 51.4 Single-Shot Refinement Neural Network for Object Detection 2017 ResNet
171 Faster R-CNN + FPN 36.2 Feature Pyramid Networks for Object Detection 2016 FPN
172 Faster R-CNN (Bottleneck-injected ResNet-50 and FPN) 35.9 torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation 2020 FPN;
ResNet
173 Faster R-CNN (box refinement, context, multi-scale testing) 34.9 Deep Residual Learning for Image Recognition 2015 multiscale
174 Faster R-CNN 34.7 Speed/accuracy trade-offs for modern convolutional object detectors 2016
175 CornerNet-Squeeze 34.4 CornerNet-Lite: Efficient Keypoint Based Object Detection 2019
176 MultiPath Network 33.2 A MultiPath Network for Object Detection 2016
177 ION 33.1 55.7 34.6 14.5 35.2 47.2 Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks 2015
178 RefineDet512 (VGG-16) 33 54.5 35.5 16.3 36.3 44.3 Single-Shot Refinement Neural Network for Object Detection 2017
179 YOLOv3 + Darknet-53 33.0 YOLOv3: An Incremental Improvement 2018 YOLO
180 SSD512 28.8 48.5 30.3 SSD: Single Shot MultiBox Detector 2015
181 MnasFPN (MobileNetV2) 26.1 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
182 ESPNetv2-512 26.0 ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network 2018
183 MnasFPN (MobileNetV3) 25.5 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
184 MnasFPN (MNASNet-B1) 24.6 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
185 MnasFPN x0.7 (MobileNetV2) 23.8 MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices 2019 FPN
186 MobielNet-v1-SSD-300x300+CGD 21.4 Compact Global Descriptor for Neural Networks 2019
187 Fast-RCNN 19.7 Fast R-CNN 2015
188 MobileNet 19.3 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 2017
189 DAT-S (RetinaNet) 69.6 51.2 32.3 51.8 63.4 47.9 Vision Transformer with Deformable Attention 2022
190 CenterMask-VoVNet99 (multi-scale) 68.3 53.2 32.4 60.0 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 multiscale
191 Mask R-CNN (HRNetV2p-W32 + cascade) 62.5 48.6 56.3 Deep High-Resolution Representation Learning for Visual Recognition 2019
192 FoveaBox (ResNeXt-101) 61.9 45.2 46.8 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
193 VirTex Mask R-CNN (ResNet-50-FPN) 61.7 44.8 VirTex: Learning Visual Representations from Textual Annotations 2020 FPN;
ResNet
194 Centermask + ResNet101 61.6 46.9 CenterMask : Real-Time Anchor-Free Instance Segmentation 2019 ResNet
195 PAFNet (ResNet50-vd) 59.8 45.3 22.8 45.8 59.2 PAFNet: An Efficient Anchor-Free Object Detector Guidance 2021 ResNet
196 IoU-Net+EnergyRegression 58.5 41.8 Energy-Based Models for Deep Probabilistic Regression 2019
197 Cascade R-CNN (HRNetV2p-W48) 48.6 26.0 47.3 56.3 Deep High-Resolution Representation Learning for Visual Recognition 2019
198 ISTR (ResNet50-FPN-3x, single-scale) 27.8 48.7 59.9 ISTR: End-to-End Instance Segmentation with Transformers 2021
199 FoveaBox (ResNeXt-101) 24.9 FoveaBox: Beyond Anchor-based Object Detector 2019 ResNeXt
200 EfficientDet-D7x (single-scale) 57.9 EfficientDet: Scalable and Efficient Object Detection 2019 single scale
posted @ 2022-02-16 18:12  Xu_Lin  阅读(989)  评论(0编辑  收藏  举报