| Finding Task-Relevant Features for Few-Shot Learning by Category Traversal |
| Edge-Labeling Graph Neural Network for Few-Shot Learning |
| Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning |
| Kervolutional Neural Networks |
| Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem |
| On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions |
| Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization |
| Hardness-Aware Deep Metric Learning |
| Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation |
| Learning Loss for Active Learning |
| Striking the Right Balance With Uncertainty |
| AutoAugment: Learning Augmentation Strategies From Data |
| SDRSAC: Semidefinite-Based Randomized Approach for Robust Point Cloud Registration Without Correspondences |
| BAD SLAM: Bundle Adjusted Direct RGB-D SLAM |
| Revealing Scenes by Inverting Structure From Motion Reconstructions |
| Strand-Accurate Multi-View Hair Capture |
| DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation |
| Pushing the Boundaries of View Extrapolation With Multiplane Images |
| GA-Net: Guided Aggregation Net for End-To-End Stereo Matching |
| Real-Time Self-Adaptive Deep Stereo |
| LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation |
| NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences |
| Coordinate-Free Carlsson-Weinshall Duality and Relative Multi-View Geometry |
| Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image |
| Video Action Transformer Network |
| Timeception for Complex Action Recognition |
| STEP: Spatio-Temporal Progressive Learning for Video Action Detection |
| Relational Action Forecasting |
| Long-Term Feature Banks for Detailed Video Understanding |
| Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes |
| What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment |
| MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation |
| 2.5D Visual Sound |
| Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model |
| Gaussian Temporal Awareness Networks for Action Localization |
| Efficient Video Classification Using Fewer Frames |
| Parsing R-CNN for Instance-Level Human Analysis |
| Large Scale Incremental Learning |
| TopNet: Structural Point Cloud Decoder |
| Perceive Where to Focus: Learning Visibility-Aware Part-Level Features for Partial Person Re-Identification |
| Meta-Transfer Learning for Few-Shot Learning |
| Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation |
| Deep RNN Framework for Visual Sequential Applications |
| Graph-Based Global Reasoning Networks |
| SSN: Learning Sparse Switchable Normalization via SparsestMax |
| Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition |
| Learning to Generate Synthetic Data via Compositing |
| Divide and Conquer the Embedding Space for Metric Learning |
| Latent Space Autoregression for Novelty Detection |
| Attending to Discriminative Certainty for Domain Adaptation |
| Feature Denoising for Improving Adversarial Robustness |
| Selective Kernel Networks |
| On Implicit Filter Level Sparsity in Convolutional Neural Networks |
| FlowNet3D: Learning Scene Flow in 3D Point Clouds |
| Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks |
| Co-Occurrent Features in Semantic Segmentation |
| Bag of Tricks for Image Classification with Convolutional Neural Networks |
| Learning Channel-Wise Interactions for Binary Convolutional Neural Networks |
| Knowledge Adaptation for Efficient Semantic Segmentation |
| Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack |
| Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-Identification |
| Dissecting Person Re-Identification From the Viewpoint of Viewpoint |
| Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification |
| Progressive Feature Alignment for Unsupervised Domain Adaptation |
| Feature-Level Frankenstein: Eliminating Variations for Discriminative Recognition |
| Learning a Deep ConvNet for Multi-Label Classification With Partial Labels |
| Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression |
| Densely Semantically Aligned Person Re-Identification |
| Generalising Fine-Grained Sketch-Based Image Retrieval |
| Adapting Object Detectors via Selective Cross-Domain Alignment |
| Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation |
| Thinking Outside the Pool: Active Training Image Creation for Relative Attributes |
| Generalizable Person Re-Identification by Domain-Invariant Mapping Network |
| Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification |
| Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification |
| Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization |
| Weakly Supervised Person Re-Identification |
| PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud |
| Automatic Adaptation of Object Detectors to New Domains Using Self-Training |
| Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing |
| Generative Dual Adversarial Network for Generalized Zero-Shot Learning |
| Query-Guided End-To-End Person Search |
| Libra R-CNN: Towards Balanced Learning for Object Detection |
| Learning a Unified Classifier Incrementally via Rebalancing |
| Feature Selective Anchor-Free Module for Single-Shot Object Detection |
| Bottom-Up Object Detection by Grouping Extreme and Center Points |
| Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples |
| SCOPS: Self-Supervised Co-Part Segmentation |
| Unsupervised Moving Object Detection via Contextual Information Separation |
| Pose2Seg: Detection Free Human Instance Segmentation |
| DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios |
| PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding |
| A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing |
| Unsupervised Learning of Consensus Maximization for 3D Vision Problems |
| VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People |
| Structural Relational Reasoning of Point Clouds |
| MVF-Net: Multi-View 3D Face Morphable Model Regression |
| Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction |
| Guided Stereo Matching |
| Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion |
| Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN |
| 3D Point Capsule Networks |
| GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving |
| Single-Image Piece-Wise Planar 3D Reconstruction via Associative Embedding |
| 3DN: 3D Deformation Network |
| HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation |
| Deep Fitting Degree Scoring Network for Monocular 3D Object Detection |
| Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering |
| Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry |
| FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image |
| Dense 3D Face Decoding Over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders |
| Does Learning Specific Features for Related Parts Help Human Pose Estimation? |
| Linkage Based Face Clustering via Graph Convolution Network |
| Towards High-Fidelity Nonlinear 3D Face Morphable Model |
| RegularFace: Deep Face Recognition via Exclusive Regularization |
| BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation |
| GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction |
| Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training |
| Learning to Reconstruct People in Clothing From a Single RGB Camera |
| Distilled Person Re-Identification: Towards a More Scalable System |
| A Perceptual Prediction Framework for Self Supervised Event Segmentation |
| COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis |
| Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization |
| An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition |
| Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection |
| MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment |
| Less Is More: Learning Highlight Detection From Video Duration |
| DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition |
| AdaFrame: Adaptive Frame Selection for Fast Video Recognition |
| Spatio-Temporal Video Re-Localization by Warp LSTM |
| Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization |
| Unsupervised Deep Tracking |
| Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers |
| Fast Online Object Tracking and Segmentation: A Unifying Approach |
| Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters |
| SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints |
| Leveraging Shape Completion for 3D Siamese Tracking |
| Target-Aware Deep Tracking |
| Spatiotemporal CNN for Video Object Segmentation |
| Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification |
| Wide-Context Semantic Image Extrapolation |
| End-To-End Time-Lapse Video Synthesis From a Single Outdoor Image |
| GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images |
| Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis |
| Pluralistic Image Completion |
| Salient Object Detection With Pyramid Attention and Salient Edges |
| Latent Filter Scaling for Multimodal Unsupervised Image-To-Image Translation |
| Attention-Aware Multi-Stroke Style Transfer |
| Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks |
| Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting |
| Example-Guided Style-Consistent Image Synthesis From Semantic Labeling |
| MirrorGAN: Learning Text-To-Image Generation by Redescription |
| Light Field Messaging With Deep Photographic Steganography |
| Im2Pencil: Controllable Pencil Illustration From Photographs |
| When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images |
| Beyond Volumetric Albedo – A Surface Optimization Framework for Non-Line-Of-Sight Imaging |
| Reflection Removal Using a Dual-Pixel Sensor |
| Practical Coding Function Design for Time-Of-Flight Imaging |
| Meta-SR: A Magnification-Arbitrary Network for Super-Resolution |
| Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net |
| Learning Attraction Field Representation for Robust Line Segment Detection |
| Blind Super-Resolution With Iterative Kernel Correction |
| Video Magnification in the Wild Using Fractional Anisotropy in Temporal Distribution |
| Attentive Feedback Network for Boundary-Aware Salient Object Detection |
| Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning |
| Learning to Calibrate Straight Lines for Fisheye Image Rectification |
| Camera Lens Super-Resolution |
| Frame-Consistent Recurrent Video Deraining With Dual-Level Flow |
| Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels |
| Sea-Thru: A Method for Removing Water From Underwater Images |
| Deep Network Interpolation for Continuous Imagery Effect Transition |
| Spatially Variant Linear Representation Models for Joint Filtering |
| Toward Convolutional Blind Denoising of Real Photographs |
| Towards Real Scene Super-Resolution With Raw Images |
| ODE-Inspired Network Design for Single Image Super-Resolution |
| Blind Image Deblurring With Local Maximum Gradient Prior |
| Attention-Guided Network for Ghost-Free High Dynamic Range Imaging |
| Searching for a Robust Neural Architecture in Four GPU Hours |
| Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction |
| Adaptively Connected Neural Networks |
| CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency |
| Temporal Cycle-Consistency Learning |
| Predicting Future Frames Using Retrospective Cycle GAN |
| Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization |
| TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning |
| Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach |
| Attentive Single-Tasking of Multiple Tasks |
| Deep Metric Learning to Rank |
| End-To-End Multi-Task Learning With Attention |
| Self-Supervised Learning via Conditional Motion Propagation |
| Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence |
| All About Structure: Adapting Structural Information Across Domains for Boosting Semantic Segmentation |
| Iterative Reorganization With Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning |
| Revisiting Self-Supervised Visual Representation Learning |
| It’s Not About the Journey; It’s About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning |
| Actively Seeking and Learning From Live Data |
| Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing |
| Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks |
| Scene Graph Generation With External Knowledge and Image Reconstruction |
| Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval |
| MUREL: Multimodal Relational Reasoning for Visual Question Answering |
| Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering |
| Information Maximizing Visual Question Generation |
| Learning to Detect Human-Object Interactions With Knowledge |
| Learning Words by Drawing Images |
| Factor Graph Attention |
| Reducing Uncertainty in Undersampled MRI Reconstruction With Active Acquisition |
| ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification |
| ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape |
| Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images |
| Biologically-Constrained Graphs for Global Connectomics Reconstruction |
| P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification |
| Elastic Boundary Projection for 3D Medical Image Segmentation |
| SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images |
| Noise2Void - Learning Denoising From Single Noisy Images |
| Joint Discriminative and Generative Learning for Person Re-Identification |
| Unsupervised Person Re-Identification by Soft Multilabel Learning |
| Learning Context Graph for Person Search |
| Gradient Matching Generative Networks for Zero-Shot Learning |
| Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval |
| Zero-Shot Task Transfer |
| C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection |
| Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations |
| Attention-Based Dropout Layer for Weakly Supervised Object Localization |
| Domain Generalization by Solving Jigsaw Puzzles |
| Transferrable Prototypical Networks for Unsupervised Domain Adaptation |
| Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks |
| ELASTIC: Improving CNNs With Dynamic Scaling Policies |
| ScratchDet: Training Single-Shot Object Detectors From Scratch |
| SFNet: Learning Object-Aware Semantic Correspondence |
| Deep Metric Learning Beyond Binary Supervision |
| Learning to Cluster Faces on an Affinity Graph |
| C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition |
| Shapes and Context: In-The-Wild Image Synthesis & Manipulation |
| Semantics Disentangling for Text-To-Image Generation |
| Semantic Image Synthesis With Spatially-Adaptive Normalization |
| Progressive Pose Attention Transfer for Person Image Generation |
| Unsupervised Person Image Generation With Semantic Parsing Transformation |
| DeepView: View Synthesis With Learned Gradient Descent |
| Animating Arbitrary Objects via Deep Motion Transfer |
| Textured Neural Avatars |
| IM-Net for High Resolution Video Frame Interpolation |
| Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation |
| Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation |
| Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping |
| DeepVoxels: Learning Persistent 3D Feature Embeddings |
| Inverse Path Tracing for Joint Material and Lighting Estimation |
| The Visual Centrifuge: Model-Free Layered Video Representations |
| Label-Noise Robust Generative Adversarial Networks |
| DLOW: Domain Flow for Adaptation and Generalization |
| CollaGAN: Collaborative GAN for Missing Image Data Imputation |
| d-SNE: Domain Adaptation Using Stochastic Neighborhood Embedding |
| Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation |
| ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation |
| ContextDesc: Local Descriptor Augmentation With Cross-Modality Context |
| Large-Scale Long-Tailed Recognition in an Open World |
| AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data |
| SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks |
| Learning Correspondence From the Cycle-Consistency of Time |
| AE2-Nets: Autoencoder in Autoencoder Networks |
| Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach |
| Learning Spatial Common Sense With Geometry-Aware Recurrent Networks |
| Structured Knowledge Distillation for Semantic Segmentation |
| Scan2CAD: Learning CAD Model Alignment in RGB-D Scans |
| Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation |
| Tell Me Where I Am: Object-Level Scene Context Prediction |
| Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation |
| Supervised Fitting of Geometric Primitives to 3D Point Clouds |
| Do Better ImageNet Models Transfer Better? |
| Gotta Adapt 'Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild |
| Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift |
| Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation |
| DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Multi-Scale Deep Features |
| Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks |
| Universal Domain Adaptation |
| Improving Transferability of Adversarial Examples With Input Diversity |
| Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition |
| Hybrid-Attention Based Decoupled Metric Learning for Zero-Shot Image Retrieval |
| Learning to Sample |
| Few-Shot Learning via Saliency-Guided Hallucination of Samples |
| Variational Convolutional Neural Network Pruning |
| Towards Optimal Structured CNN Pruning via Generative Adversarial Learning |
| Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression |
| Fully Quantized Network for Object Detection |
| MnasNet: Platform-Aware Neural Architecture Search for Mobile |
| Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More |
| K-Nearest Neighbors Hashing |
| Learning RoI Transformer for Oriented Object Detection in Aerial Images |
| Snapshot Distillation: Teacher-Student Optimization in One Generation |
| Geometry-Aware Distillation for Indoor Semantic Segmentation |
| LiveSketch: Query Perturbations for Guided Sketch-Based Visual Search |
| Bounding Box Regression With Uncertainty for Accurate Object Detection |
| OCGAN: One-Class Novelty Detection Using GANs With Constrained Latent Representations |
| Learning Metrics From Teachers: Compact Networks for Image Embedding |
| Activity Driven Weakly Supervised Object Detection |
| Separate to Adapt: Open Set Domain Adaptation via Progressive Separation |
| Layout-Graph Reasoning for Fashion Landmark Detection |
| DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs |
| Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks |
| Region Proposal by Guided Anchoring |
| Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation |
| Learning to Transfer Examples for Partial Domain Adaptation |
| Generalized Zero-Shot Recognition Based on Visually Semantic Embedding |
| Towards Visual Feature Translation |
| Amodal Instance Segmentation With KINS Dataset |
| Global Second-Order Pooling Convolutional Networks |
| Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up |
| NetTailor: Tuning the Architecture, Not Just the Weights |
| Learning-Based Sampling for Natural Image Matting |
| Learning Unsupervised Video Object Segmentation Through Visual Attention |
| 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks |
| Pyramid Feature Attention Network for Saliency Detection |
| Co-Saliency Detection via Mask-Guided Fully Convolutional Networks With Multi-Scale Label Smoothing |
| SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines |
| Learning Instance Activation Maps for Weakly Supervised Instance Segmentation |
| Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation |
| Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation |
| Dual Attention Network for Scene Segmentation |
| InverseRenderNet: Learning Single Image Inverse Rendering |
| A Variational Auto-Encoder Model for Stochastic Point Processes |
| Unifying Heterogeneous Classifiers With Distillation |
| Assessment of Faster R-CNN in Man-Machine Collaborative Search |
| OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge |
| NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction |
| Spectral Metric for Dataset Complexity Assessment |
| ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding |
| VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild |
| 3D Local Features for Direct Pairwise Registration |
| HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds |
| GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices |
| Group-Wise Correlation Stereo Network |
| Multi-Level Context Ultra-Aggregation for Stereo Matching |
| Large-Scale, Metric Structure From Motion for Unordered Light Fields |
| Understanding the Limitations of CNN-Based Absolute Camera Pose Regression |
| DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image |
| Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling |
| Learning With Batch-Wise Optimal Transport Loss for 3D Shape Recognition |
| DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion |
| Dense Depth Posterior (DDP) From Single Image and Sparse Range |
| DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama |
| Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach |
| Segmentation-Driven 6D Object Pose Estimation |
| Exploiting Temporal Context for 3D Human Pose Estimation in the Wild |
| What Do Single-View 3D Reconstruction Networks Learn? |
| UniformFace: Learning Deep Equidistributed Representation for Face Recognition |
| Semantic Graph Convolutional Networks for 3D Human Pose Regression |
| Mask-Guided Portrait Editing With Conditional GANs |
| Group Sampling for Scale Invariant Face Detection |
| Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation |
| Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection |
| LAEO-Net: Revisiting People Looking at Each Other in Videos |
| Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks |
| Learning Individual Styles of Conversational Gesture |
| Face Anti-Spoofing: Model Matters, so Does Data |
| Fast Human Pose Estimation |
| Decorrelated Adversarial Learning for Age-Invariant Face Recognition |
| Cross-Task Weakly Supervised Learning From Instructional Videos |
| D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation |
| Progressive Teacher-Student Learning for Early Action Prediction |
| Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning |
| MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation |
| Transferable Interactiveness Knowledge for Human-Object Interaction Detection |
| Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition |
| Multi-Granularity Generator for Temporal Action Proposal |
| Deep Rigid Instance Scene Flow |
| See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks |
| Patch-Based Discriminative Feature Learning for Unsupervised Person Re-Identification |
| SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking |
| Spatial Fusion GAN for Image Synthesis |
| Text Guided Person Image Synthesis |
| STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing |
| Towards Instance-Level Image-To-Image Translation |
| Dense Intrinsic Appearance Flow for Human Pose Transfer |
| Depth-Aware Video Frame Interpolation |
| Sliced Wasserstein Generative Models |
| Deep Flow-Guided Video Inpainting |
| Video Generation From Single Semantic Label Map |
| Polarimetric Camera Calibration Using an LCD Monitor |
| Fully Automatic Video Colorization With Self-Regularization and Diversity |
| Zoom to Learn, Learn to Zoom |
| Single Image Reflection Removal Beyond Linearity |
| Learning to Separate Multiple Illuminants in a Single Image |
| Shape Unicode: A Unified Shape Representation |
| Robust Video Stabilization by Optimization in CNN Weight Space |
| Learning Linear Transformations for Fast Image and Video Style Transfer |
| Local Detection of Stereo Occlusion Boundaries |
| Bi-Directional Cascade Network for Perceptual Edge Detection |
| Single Image Deraining: A Comprehensive Benchmark Analysis |
| Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections |
| Events-To-Video: Bringing Modern Computer Vision to Event Cameras |
| Feedback Network for Image Super-Resolution |
| Semi-Supervised Transfer Learning for Image Rain Removal |
| EventNet: Asynchronous Recursive Event Processing |
| Recurrent Back-Projection Network for Video Super-Resolution |
| Cascaded Partial Decoder for Fast and Accurate Salient Object Detection |
| A Simple Pooling-Based Design for Real-Time Salient Object Detection |
| Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection |
| Progressive Image Deraining Networks: A Better and Simpler Baseline |
| GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud |
| Attentive Relational Networks for Mapping Images to Scene Graphs |
| Relational Knowledge Distillation |
| Compressing Convolutional Neural Networks via Factorized Convolutional Filters |
| On the Intrinsic Dimensionality of Image Representations |
| Part-Regularized Near-Duplicate Vehicle Re-Identification |
| Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics |
| Classification-Reconstruction Learning for Open-Set Recognition |
| Emotion-Aware Human Attention Prediction |
| Residual Regression With Semantic Prior for Crowd Counting |
| Context-Reinforced Semantic Segmentation |
| Adversarial Structure Matching for Structured Prediction Tasks |
| Deep Spectral Clustering Using Dual Autoencoder Network |
| Deep Asymmetric Metric Learning via Rich Relationship Mining |
| Did It Change? Learning to Detect Point-Of-Interest Changes for Proactive Map Updates |
| Associatively Segmenting Instances and Semantics in Point Clouds |
| Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation |
| Scene Categorization From Contours: Medial Axis Based Salience Measures |
| Unsupervised Image Captioning |
| Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables |
| Cross-Modal Relationship Inference for Grounding Referring Expressions |
| What’s to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions |
| Iterative Alignment Network for Continuous Sign Language Recognition |
| Neural Sequential Phrase Grounding (SeqGROUND) |
| CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions |
| Describing Like Humans: On Diversity in Image Captioning |
| MSCap: Multi-Style Image Captioning With Unpaired Stylized Text |
| CRAVES: Controlling Robotic Arm With a Vision-Based Economic System |
| Networks for Joint Affine and Non-Parametric Image Registration |
| Learning Shape-Aware Embedding for Scene Text Detection |
| Learning to Film From Professional Human Motion Videos |
| Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention |
| Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence |
| Learning Video Representations From Correspondence Proposals |
| SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks |
| Sphere Generative Adversarial Network Based on Geometric Moment Matching |
| Adversarial Attacks Beyond the Image Space |
| Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks |
| Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses |
| A General and Adaptive Robust Loss Function |
| Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration |
| Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss |
| Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection |
| Unsupervised Learning of Dense Shape Correspondence |
| Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach |
| Balanced Self-Paced Learning for Generative Adversarial Clustering Network |
| A Style-Based Generator Architecture for Generative Adversarial Networks |
| Parallel Optimal Transport GAN |
| 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans |
| Causes and Corrections for Bimodal Multi-Path Scanning With Structured Light |
| TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes |
| PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image |
| Occupancy Networks: Learning 3D Reconstruction in Function Space |
| 3D Shape Reconstruction From Images in the Frequency Domain |
| SiCloPe: Silhouette-Based Clothed People |
| Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation |
| Convolutional Mesh Regression for Single-Image Human Shape Reconstruction |
| H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions |
| Learning the Depths of Moving People by Watching Frozen People |
| Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion |
| A Skeleton-Bridged Deep Learning Approach for Generating Meshes of Complex Topologies From Single RGB Images |
| Learning Structure-And-Motion-Aware Rolling Shutter Correction |
| PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation |
| SelFlow: Self-Supervised Learning of Optical Flow |
| Taking a Deeper Look at the Inverse Compositional Algorithm |
| Deeper and Wider Siamese Networks for Real-Time Visual Tracking |
| Self-Supervised Adaptation of High-Fidelity Face Models for Monocular Performance Tracking |
| Diverse Generation for Multi-Agent Sports Games |
| Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields |
| GFrames: Gradient-Based Local Reference Frame for 3D Shape Matching |
| Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking |
| Graph Convolutional Tracking |
| ATOM: Accurate Tracking by Overlap Maximization |
| Visual Tracking via Adaptive Spatially-Regularized Correlation Filters |
| Deep Tree Learning for Zero-Shot Face Anti-Spoofing |
| ArcFace: Additive Angular Margin Loss for Deep Face Recognition |
| Learning Joint Gait Representation via Quintuplet Loss Minimization |
| Gait Recognition via Disentangled Representation Learning |
| Reversible GANs for Memory-Efficient Image-To-Image Translation |
| Sensitive-Sample Fingerprinting of Deep Neural Networks |
| Soft Labels for Ordinal Regression |
| Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks |
| What Does It Mean to Learn in Deep Networks? And, How Does One Detect Adversarial Attacks? |
| Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning |
| Adversarial Defense Through Network Profiling Based Path Extraction |
| RENAS: Reinforced Evolutionary Neural Architecture Search |
| Co-Occurrence Neural Network |
| SpotTune: Transfer Learning Through Adaptive Fine-Tuning |
| Signal-To-Noise Ratio: A Robust Distance Metric for Deep Metric Learning |
| Detection Based Defense Against Adversarial Examples From the Steganalysis Point of View |
| HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs |
| Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects |
| Blind Geometric Distortion Correction on Images Through Deep Learning |
| Instance-Level Meta Normalization |
| Iterative Normalization: Beyond Standardization Towards Efficient Whitening |
| On Learning Density Aware Embeddings |
| Contrastive Adaptation Network for Unsupervised Domain Adaptation |
| LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks |
| Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification |
| Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? |
| Distilling Object Detectors With Fine-Grained Feature Imitation |
| Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure |
| Knockoff Nets: Stealing Functionality of Black-Box Models |
| Deep Embedding Learning With Discriminative Sampling Policy |
| Hybrid Task Cascade for Instance Segmentation |
| Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations |
| ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis |
| Learning to Learn Relation for Important People Detection in Still Images |
| Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition |
| Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning |
| Domain-Symmetric Networks for Adversarial Domain Adaptation |
| End-To-End Supervised Product Quantization for Image Search and Retrieval |
| Learning to Learn From Noisy Labeled Data |
| DSFD: Dual Shot Face Detector |
| Label Propagation for Deep Semi-Supervised Learning |
| Deep Global Generalized Gaussian Networks |
| Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval |
| Context-Aware Crowd Counting |
| Detect-To-Retrieve: Efficient Regional Aggregation for Image Search |
| Towards Accurate One-Stage Object Detection With AP-Loss |
| On Exploring Undetermined Relationships for Visual Relationship Detection |
| Learning Without Memorizing |
| Dynamic Recursive Neural Network |
| Destruction and Construction Learning for Fine-Grained Image Recognition |
| Distraction-Aware Shadow Detection |
| Multi-Label Image Recognition With Graph Convolutional Networks |
| High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection |
| RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection |
| Ranked List Loss for Deep Metric Learning |
| CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning |
| Precise Detection in Densely Packed Scenes |
| KE-GAN: Knowledge Embedded Generative Adversarial Networks for Semi-Supervised Scene Parsing |
| Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks |
| Fast Interactive Object Annotation With Curve-GCN |
| FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference |
| RVOS: End-To-End Recurrent Network for Video Object Segmentation |
| DeepFlux for Skeletons in the Wild |
| Interactive Image Segmentation via Backpropagating Refinement Scheme |
| Scene Parsing via Integrated Classification Model and Variance-Based Regularization |
| RAVEN: A Dataset for Relational and Analogical Visual REasoNing |
| Surface Reconstruction From Normals: A Robust DGP-Based Discontinuity Preservation Approach |
| DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images |
| Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure From Motion |
| LVIS: A Dataset for Large Vocabulary Instance Segmentation |
| Fast Object Class Labelling via Speech |
| LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking |
| Creative Flow+ Dataset |
| Weakly Supervised Open-Set Domain Adaptation by Dual-Domain Collaboration |
| A Neurobiological Evaluation Metric for Neural Network Model Search |
| Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision |
| Efficient Multi-Domain Learning by Covariance Normalization |
| Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance |
| A Bayesian Perspective on the Deep Image Prior |
| ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving |
| Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification |
| Self-Supervised Convolutional Subspace Clustering Network |
| Multi-Scale Geometric Consistency Guided Multi-View Stereo |
| Privacy Preserving Image-Based Localization |
| SimulCap : Single-View Human Performance Capture With Cloth Simulation |
| Hierarchical Deep Stereo Matching on High-Resolution Images |
| Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference |
| Synthesizing 3D Shapes From Silhouette Image Collections Using Multi-Projection Generative Adversarial Networks |
| The Perfect Match: 3D Point Cloud Matching With Smoothed Densities |
| Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth |
| PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing |
| Scan2Mesh: From Unstructured Range Scans to 3D Meshes |
| Unsupervised Domain Adaptation for ToF Data Denoising With Adversarial Learning |
| Learning Independent Object Motion From Unlabelled Stereoscopic Videos |
| Learning Single-Image Depth From Videos Using Quality Assessment Networks |
| Learning 3D Human Dynamics From Video |
| Lending Orientation to Neural Networks for Cross-View Geo-Localization |
| Visual Localization by Learning Objects-Of-Interest Dense Match Regression |
| Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction |
| Face Parsing With RoI Tanh-Warping |
| Multi-Person Articulated Tracking With Spatial and Temporal Embeddings |
| Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information |
| A Compact Embedding for Facial Expression Similarity |
| Deep High-Resolution Representation Learning for Human Pose Estimation |
| Feature Transfer Learning for Face Recognition With Under-Represented Data |
| Unsupervised 3D Pose Estimation With Geometric Self-Supervision |
| Peeking Into the Future: Predicting Future Person Activities and Locations in Videos |
| Re-Identification With Consistent Attentive Siamese Networks |
| On the Continuity of Rotation Representations in Neural Networks |
| Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation |
| Inverse Discriminative Networks for Handwritten Signature Verification |
| Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces |
| ROI Pooled Correlation Filters for Visual Tracking |
| Deep Video Inpainting |
| DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis |
| Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors |
| Mixture Density Generative Adversarial Networks |
| SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network |
| Foreground-Aware Image Inpainting |
| Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation |
| Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching |
| DynTypo: Example-Based Dynamic Text Effects Transfer |
| Arbitrary Style Transfer With Style-Attentional Networks |
| Typography With Decor: Intelligent Text Style Transfer |
| RL-GAN-Net: A Reinforcement Learning Agent Controlled GAN Network for Real-Time Point Cloud Shape Completion |
| Photo Wake-Up: 3D Character Animation From a Single Photo |
| DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality |
| Iterative Residual CNNs for Burst Photography Applications |
| Learning Implicit Fields for Generative Shape Modeling |
| Reliable and Efficient Image Cropping: A Grid Anchor Based Approach |
| Patch-Based Progressive 3D Point Set Upsampling |
| An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection |
| Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring |
| Turn a Silicon Camera Into an InGaAs Camera |
| Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms |
| Joint Representative Selection and Feature Learning: A Semi-Supervised Approach |
| The Domain Transform Solver |
| CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection |
| Phase-Only Image Based Kernel Estimation for Single Image Blind Deblurring |
| Hierarchical Discrete Distribution Decomposition for Match Density Estimation |
| FOCNet: A Fractional Optimal Control Network for Image Denoising |
| Orthogonal Decomposition Network for Pixel-Wise Binary Classification |
| Multi-Source Weak Supervision for Saliency Detection |
| ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples |
| Combinatorial Persistency Criteria for Multicut and Max-Cut |
| S4Net: Single Stage Salient-Instance Segmentation |
| A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem |
| Polynomial Representation for Persistence Diagram |
| Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks |
| Cross-Atlas Convolution for Parameterization Invariant Learning on Textured Mesh Surface |
| Deep Surface Normal Estimation With Hierarchical RGB-D Fusion |
| Knowledge-Embedded Routing Network for Scene Graph Generation |
| An End-To-End Network for Panoptic Segmentation |
| Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models |
| Marginalized Latent Semantic Encoder for Zero-Shot Learning |
| Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation |
| Unsupervised Embedding Learning via Invariant and Spreading Instance Feature |
| AOGNets: Compositional Grammatical Architectures for Deep Learning |
| A Robust Local Spectral Descriptor for Matching Non-Rigid Shapes With Incompatible Shape Structures |
| Context and Attribute Grounded Dense Captioning |
| Spot and Learn: A Maximum-Entropy Patch Sampler for Few-Shot Image Classification |
| Interpreting CNNs via Decision Trees |
| Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning |
| Deep Modular Co-Attention Networks for Visual Question Answering |
| Synthesizing Environment-Aware Activities via Activity Sketches |
| Self-Critical N-Step Training for Image Captioning |
| Multi-Target Embodied Question Answering |
| Visual Question Answering as Reading Comprehension |
| StoryGAN: A Sequential Conditional GAN for Story Visualization |
| Noise-Aware Unsupervised Deep Lidar-Stereo Fusion |
| Versatile Multiple Choice Learning and Its Application to Vision Computing |
| EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors |
| ToothNet: Automatic Tooth Instance Segmentation and Identification From Cone Beam CT Images |
| Modularized Textual Grounding for Counterfactual Resilience |
| L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving |
| Panoptic Feature Pyramid Networks |
| Mask Scoring R-CNN |
| Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection |
| Cross-Modality Personalization for Retrieval |
| Composing Text and Image for Image Retrieval - an Empirical Odyssey |
| Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation |
| Adaptive NMS: Refining Pedestrian Detection in a Crowd |
| Point in, Box Out: Beyond Counting Persons in Crowds |
| Locating Objects Without Bounding Boxes |
| FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery |
| Mutual Learning of Complementary Networks via Residual Correction for Improving Semi-Supervised Classification |
| Sampling Techniques for Large-Scale Object Detection From Sparsely Annotated Objects |
| Curls & Whey: Boosting Black-Box Adversarial Attacks |
| Barrage of Random Transforms for Adversarially Robust Defense |
| Aggregation Cross-Entropy for Sequence Recognition |
| LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning |
| Few-Shot Learning With Localization in Realistic Settings |
| AdaGraph: Unifying Predictive and Continuous Domain Adaptation Through Graphs |
| Grounded Video Description |
| Streamlined Dense Video Captioning |
| Adversarial Inference for Multi-Sentence Video Description |
| Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations |
| Learning to Compose Dynamic Tree Structures for Visual Contexts |
| Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation |
| Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering |
| Cycle-Consistency for Robust Visual Question Answering |
| Embodied Question Answering in Photorealistic Environments With Point Cloud Perception |
| Reasoning Visual Dialogs With Structural and Partial Observations |
| Recursive Visual Attention in Visual Dialog |
| Two Body Problem: Collaborative Visual Task Completion |
| GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering |
| Text2Scene: Generating Compositional Scenes From Textual Descriptions |
| From Recognition to Cognition: Visual Commonsense Reasoning |
| The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation |
| Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation |
| Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning |
| High Flux Passive Imaging With Single-Photon Sensors |
| Photon-Flooded Single-Photon 3D Cameras |
| Acoustic Non-Line-Of-Sight Imaging |
| Steady-State Non-Line-Of-Sight Imaging |
| A Theory of Fermat Paths for Non-Line-Of-Sight Shape Reconstruction |
| End-To-End Projector Photometric Compensation |
| Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera |
| Bringing Alive Blurred Moments |
| Learning to Synthesize Motion Blur |
| Underexposed Photo Enhancement Using Deep Illumination Estimation |
| Blind Visual Motif Removal From a Single Image |
| Non-Local Meets Global: An Integrated Paradigm for Hyperspectral Denoising |
| Neural Rerendering in the Wild |
| GeoNet: Deep Geodesic Networks for Point Cloud Analysis |
| MeshAdv: Adversarial Meshes for Visual Recognition |
| Fast Spatially-Varying Indoor Lighting Estimation |
| Neural Illumination: Lighting Prediction for Indoor Environments |
| Deep Sky Modeling for Single Image Outdoor Lighting Estimation |
| Bidirectional Learning for Domain Adaptation of Semantic Segmentation |
| Enhanced Bayesian Compression via Deep Reinforcement Learning |
| Strong-Weak Distribution Alignment for Adaptive Object Detection |
| MFAS: Multimodal Fusion Architecture Search |
| Disentangling Adversarial Robustness and Generalization |
| ShieldNets: Defending Against Adversarial Attacks Using Probabilistic Adversarial Robustness |
| Deeply-Supervised Knowledge Synergy |
| Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration |
| Probabilistic End-To-End Noise Correction for Learning With Noisy Labels |
| Attention-Guided Unified Network for Panoptic Segmentation |
| NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection |
| OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks |
| Semantically Aligned Bias Reducing Zero Shot Learning |
| Feature Space Perturbations Yield More Transferable Adversarial Examples |
| IGE-Net: Inverse Graphics Energy Networks for Human Pose Estimation and Single-View Reconstruction |
| Accelerating Convolutional Neural Networks via Activation Map Compression |
| Knowledge Distillation via Instance Relationship Graph |
| PPGNet: Learning Point-Pair Graph for Line Segment Detection |
| Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling |
| Variational Bayesian Dropout With a Hierarchical Prior |
| AANet: Attribute Attention Network for Person Re-Identifications |
| Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction |
| A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks |
| PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet |
| Few-Shot Adaptive Faster R-CNN |
| VRSTC: Occlusion-Free Video Person Re-Identification |
| Compact Feature Learning for Multi-Domain Image Classification |
| Adaptive Transfer Network for Cross-Domain Person Re-Identification |
| Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy |
| Moving Object Detection Under Discontinuous Change in Illumination Using Tensor Low-Rank and Invariant Sparse Decomposition |
| Pedestrian Detection With Autoregressive Network Phases |
| All You Need Is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification |
| Stochastic Class-Based Hard Example Mining for Deep Metric Learning |
| Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning |
| Towards Robust Curve Text Detection With Conditional Spatial Expansion |
| Revisiting Perspective Information for Efficient Crowd Counting |
| Towards Universal Object Detection by Domain Attention |
| Ensemble Deep Manifold Similarity Learning Using Hard Proxies |
| Quantization Networks |
| RES-PCA: A Scalable Approach to Recovering Low-Rank Matrices |
| Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks |
| Efficient Featurized Image Pyramid Network for Single Shot Detector |
| Multi-Task Multi-Sensor Fusion for 3D Object Detection |
| Domain-Specific Batch Normalization for Unsupervised Domain Adaptation |
| Grid R-CNN |
| MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition |
| Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map |
| Triply Supervised Decoder Networks for Joint Detection and Segmentation |
| Leveraging the Invariant Side of Generative Zero-Shot Learning |
| Exploring the Bounds of the Utility of Context for Object Detection |
| A-CNN: Annularly Convolutional Neural Networks on Point Clouds |
| DARNet: Deep Active Ray Network for Building Segmentation |
| Point Cloud Oversegmentation With Graph-Structured Deep Metric Learning |
| Graphonomy: Universal Human Parsing via Graph Transfer Learning |
| Fitting Multiple Heterogeneous Models by Multi-Class Cascaded T-Linkage |
| A Late Fusion CNN for Digital Matting |
| BASNet: Boundary-Aware Salient Object Detection |
| ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation |
| Object Instance Annotation With Deep Extreme Level Set Evolution |
| Leveraging Crowdsourced GPS Data for Road Extraction From Aerial Imagery |
| Adaptive Pyramid Context Network for Semantic Segmentation |
| Isospectralization, or How to Hear Shape, Style, and Correspondence |
| Speech2Face: Learning the Face Behind a Voice |
| Joint Manifold Diffusion for Combining Predictions on Decoupled Observations |
| Audio Visual Scene-Aware Dialog |
| Learning to Minify Photometric Stereo |
| Reflective and Fluorescent Separation Under Narrow-Band Illumination |
| Depth From a Polarisation + RGB Stereo Pair |
| Rethinking the Evaluation of Video Summaries |
| What Object Should I Use? - Task Driven Object Detection |
| Triangulation Learning Network: From Monocular to Stereo 3D Object Detection |
| Connecting the Dots: Learning Representations for Active Monocular Depth Estimation |
| Learning Non-Volumetric Depth Fusion Using Successive Reprojections |
| Stereo R-CNN Based 3D Object Detection for Autonomous Driving |
| Hybrid Scene Compression for Visual Localization |
| MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction |
| 3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis |
| Single Image Depth Estimation Trained via Depth From Defocus Cues |
| RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion |
| Neural Scene Decomposition for Multi-Person Motion Capture |
| Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition |
| FA-RPN: Floating Region Proposals for Face Detection |
| Bayesian Hierarchical Dynamic Model for Human Action Recognition |
| Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation |
| 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training |
| Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision |
| PoseFix: Model-Agnostic General Human Pose Refinement Network |
| RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation |
| Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views |
| Face-Focused Cross-Stream Network for Deception Detection in Videos |
| Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data |
| T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor |
| Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss |
| Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video |
| DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition |
| The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos |
| Collaborative Spatiotemporal Feature Learning for Video Action Recognition |
| MARS: Motion-Augmented RGB Stream for Action Recognition |
| Convolutional Relational Machine for Group Activity Recognition |
| Video Summarization by Learning From Unpaired Data |
| Skeleton-Based Action Recognition With Directed Graph Neural Networks |
| PA3D: Pose-Action 3D Machine for Video Recognition |
| Deep Dual Relation Modeling for Egocentric Interaction Recognition |
| MOTS: Multi-Object Tracking and Segmentation |
| Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking |
| PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds |
| Listen to the Image |
| Image Super-Resolution by Neural Texture Transfer |
| Conditional Adversarial Generative Flow for Controllable Image Synthesis |
| How to Make a Pizza: Learning a Compositional Layer-Based GAN Model |
| TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation |
| Depth-Attentional Features for Single-Image Rain Removal |
| Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior |
| LiFF: Light Field Features in Scale and Depth |
| Deep Exemplar-Based Video Colorization |
| On Finding Gray Pixels |
| UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos |
| Learning Transformation Synchronization |
| D2-Net: A Trainable CNN for Joint Description and Detection of Local Features |
| Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring |
| Learning to Extract Flawless Slow Motion From Blurry Videos |
| Natural and Realistic Single Image Super-Resolution With Explicit Natural Manifold Discrimination |
| RF-Net: An End-To-End Image Matching Network Based on Receptive Field |
| Fast Single Image Reflection Suppression via Convex Optimization |
| A Mutual Learning Method for Salient Object Detection With Intertwined Multi-Supervision |
| Enhanced Pix2pix Dehazing Network |
| Assessing Personally Perceived Image Quality via Image Features and Collaborative Filtering |
| Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements |
| Exploring Context and Visual Pattern of Relationship for Scene Graph Generation |
| Learning From Synthetic Data for Crowd Counting in the Wild |
| A Local Block Coordinate Descent Algorithm for the CSC Model |
| Not Using the Car to See the Sidewalk – Quantifying and Controlling the Effects of Context in Classification and Segmentation |
| Discovering Fair Representations in the Data Domain |
| Actor-Critic Instance Segmentation |
| Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders |
| Semantic Projection Network for Zero- and Few-Label Semantic Segmentation |
| GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation |
| Seamless Scene Segmentation |
| Unsupervised Image Matching and Object Discovery as Optimization |
| Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs |
| Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions |
| Towards VQA Models That Can Read |
| Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning |
| Progressive Attention Memory Network for Movie Story Question Answering |
| Memory-Attended Recurrent Network for Video Captioning |
| Visual Query Answering by Entity-Attribute Graph Matching and Reasoning |
| Look Back and Predict Forward in Image Captioning |
| Explainable and Explicit Visual Reasoning Over Scene Graphs |
| Transfer Learning via Unsupervised Task Discovery for Visual Question Answering |
| Intention Oriented Image Captions With Guiding Objects |
| Uncertainty Guided Multi-Scale Residual Learning-Using a Cycle Spinning CNN for Single Image De-Raining |
| Toward Realistic Image Compositing With Adversarial Learning |
| Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics |
| Deep ChArUco: Dark ChArUco Marker Pose Estimation |
| Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving |
| Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions |
| Metric Learning for Image Registration |
| LO-Net: Deep Real-Time Lidar Odometry |
| TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions |
| World From Blur |
| Topology Reconstruction of Tree-Like Structure in Images via Structural Similarity Measure and Dominant Set Clustering |
| Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training |
| Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning From Radiology Reports and Label Ontology |
| Robust Histopathology Image Analysis: To Label or to Synthesize? |
| Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation |
| Shifting More Attention to Video Salient Object Detection |
| Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration |
| Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry |
| Image Generation From Layout |
| Multimodal Explanations by Predicting Counterfactuality in Videos |
| Learning to Explain With Complemental Examples |
| HAQ: Hardware-Aware Automated Quantization With Mixed Precision |
| Content Authentication for Neural Imaging Pipelines: End-To-End Optimization of Photo Provenance in Complex Distribution Channels |
| Inverse Procedural Modeling of Knitwear |
| Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video |
| DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds |
| End-To-End Interpretable Neural Motion Planner |
| Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model |
| Image Deformation Meta-Networks for One-Shot Learning |
| Online High Rank Matrix Completion |
| Multispectral Imaging for Fine-Grained Recognition of Powders on Complex Backgrounds |
| ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging |
| Robust Subspace Clustering With Independent and Piecewise Identically Distributed Noise Modeling |
| What Correspondences Reveal About Unknown Camera and Motion Models? |
| Self-Calibrating Deep Photometric Stereo Networks |
| Argoverse: 3D Tracking and Forecasting With Rich Maps |
| Side Window Filtering |
| Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search |
| Incremental Object Learning From Contiguous Views |
| IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition |
| CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification |
| Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence |
| UPSNet: A Unified Panoptic Segmentation Network |
| JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields |
| Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth |
| DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection |
| Improving Semantic Segmentation via Video Propagation and Label Relaxation |
| Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video |
| Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes |
| Semantic Correlation Promoted Shape-Variant Context for Segmentation |
| Relation-Shape Convolutional Neural Network for Point Cloud Analysis |
| Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network |
| BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames |
| Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images |
| Efficient Parameter-Free Clustering Using First Neighbor Relations |
| Learning Personalized Modular Network Guided by Structured Knowledge |
| A Generative Appearance Model for End-To-End Video Object Segmentation |
| A Flexible Convolutional Solver for Fast Style Transfers |
| Cross Domain Model Compression by Structurally Weight Sharing |
| TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning |
| Deep Robust Subjective Visual Property Prediction in Crowdsourcing |
| Transferable AutoML by Model Sharing Over Grouped Datasets |
| Learning Not to Learn: Training Deep Neural Networks With Biased Data |
| IRLAS: Inverse Reinforcement Learning for Architecture Search |
| Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences |
| Attention-Based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions |
| Fully Learnable Group Convolution for Acceleration of Deep Neural Networks |
| EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching From Scratch |
| Deep Incremental Hashing Network for Efficient Image Retrieval |
| Robustness via Curvature Regularization, and Vice Versa |
| SparseFool: A Few Pixels Make a Big Difference |
| Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks |
| Structured Pruning of Neural Networks With Budget-Aware Regularization |
| MBS: Macroblock Scaling for CNN Model Reduction |
| Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells |
| Generating 3D Adversarial Point Clouds |
| Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search |
| Memory in Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity From Spatiotemporal Dynamics |
| Variational Information Distillation for Knowledge Transfer |
| You Look Twice: GaterNet for Dynamic Filter Selection in CNNs |
| SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360deg Images |
| ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network |
| Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors |
| Exploiting Edge Features for Graph Neural Networks |
| Propagation Mechanism for Deep and Wide Neural Networks |
| Catastrophic Child’s Play: Easy to Perform, Hard to Defend Adversarial Attacks |
| Embedding Complementary Deep Networks for Image Classification |
| Deep Multimodal Clustering for Unsupervised Audiovisual Learning |
| Dense Classification and Implanting for Few-Shot Learning |
| Class-Balanced Loss Based on Effective Number of Samples |
| Discovering Visual Patterns in Art Collections With Spatially-Consistent Feature Learning |
| Min-Max Statistical Alignment for Transfer Learning |
| Spatial-Aware Graph Relation Network for Large-Scale Object Detection |
| Deformable ConvNets V2: More Deformable, Better Results |
| Interaction-And-Aggregation Network for Person Re-Identification |
| Rare Event Detection Using Disentangled Representation Learning |
| Shape Robust Text Detection With Progressive Scale Expansion Network |
| Dual Encoding for Zero-Example Video Retrieval |
| MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors |
| Character Region Awareness for Text Detection |
| Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features |
| Attentive Region Embedding Network for Zero-Shot Learning |
| Explicit Spatial Encoding for Deep Local Descriptors |
| Panoptic Segmentation |
| You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection |
| Explore-Exploit Graph Traversal for Image Retrieval |
| Dissimilarity Coefficient Based Weakly Supervised Object Detection |
| Kernel Transformer Networks for Compact Spherical Convolution |
| Object Detection With Location-Aware Deformable Convolution and Backward Attention Filtering |
| Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images |
| Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss |
| FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation |
| PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation |
| Learning Multi-Class Segmentations From Single-Class Datasets |
| Convolutional Recurrent Network for Road Boundary Extraction |
| DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation |
| A Cross-Season Correspondence Dataset for Robust Semantic Segmentation |
| ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features |
| On Zero-Shot Recognition of Generic Objects |
| Explicit Bias Discovery in Visual Question Answering Models |
| REPAIR: Removing Representation Bias by Dataset Resampling |
| Label Efficient Semi-Supervised Learning via Graph Filtering |
| MVTec AD – A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection |
| ABC: A Big CAD Model Dataset for Geometric Deep Learning |
| Tightness-Aware Evaluation Protocol for Scene Text Detection |
| PointConv: Deep Convolutional Networks on 3D Point Clouds |
| Octree Guided CNN With Spherical Kernels for 3D Point Clouds |
| VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points |
| Conditional Single-View Shape Generation for Multi-View Stereo Reconstruction |
| Learning to Adapt for Stereo |
| 3D Appearance Super-Resolution With Deep Learning |
| Radial Distortion Triangulation |
| Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes |
| Minimal Solvers for Mini-Loop Closures in 3D Multi-Scan Alignment |
| Volumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric Learning |
| Joint Face Detection and Facial Motion Retargeting for Multiple Faces |
| Monocular Depth Estimation Using Relative Depth Maps |
| Unsupervised Primitive Discovery for Improved 3D Generative Modeling |
| Learning to Explore Intrinsic Saliency for Stereoscopic Video |
| Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres |
| Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation |
| Learning View Priors for Single-View 3D Reconstruction |
| Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation |
| Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge |
| SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception |
| 3D Guided Fine-Grained Face Manipulation |
| Neuro-Inspired Eye Tracking With Eye Movement Dynamics |
| Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally |
| Unsupervised Face Normalization With Extreme Pose and Expression in the Wild |