| Semantic Component Decomposition for Face Attribute Manipulation |
| R3 Adversarial Network for Cross Model Face Recognition |
| Disentangling Latent Hands for Image Synthesis and Pose Estimation |
| Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network |
| CrossInfoNet: Multi-Task Information Sharing Based Hand Pose Estimation |
| P2SGrad: Refined Gradients for Optimizing Deep Face Models |
| Action Recognition From Single Timestamp Supervision in Untrimmed Videos |
| Time-Conditioned Action Anticipation in One Shot |
| Dance With Flow: Two-In-One Stream Action Detection |
| Representation Flow for Action Recognition |
| LSTA: Long Short-Term Attention for Egocentric Action Recognition |
| Learning Actor Relation Graphs for Group Activity Recognition |
| A Structured Model for Action Detection |
| Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition |
| Object Discovery in Videos as Foreground Motion Clustering |
| Towards Natural and Accurate Future Motion Prediction of Humans and Animals |
| Automatic Face Aging in Videos via Deep Reinforcement Learning |
| Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection |
| A Content Transformation Block for Image Style Transfer |
| BeautyGlow: On-Demand Makeup Transfer Framework With Reversible Generative Network |
| Style Transfer by Relaxed Optimal Transport and Self-Similarity |
| Inserting Videos Into Videos |
| Learning Image and Video Compression Through Spatial-Temporal Energy Compaction |
| Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks |
| Enhancing TripleGAN for Semi-Supervised Conditional Instance Synthesis and Classification |
| Capture, Learning, and Synthesis of 3D Speaking Styles |
| Nesti-Net: Normal Estimation for Unstructured 3D Point Clouds Using Convolutional Neural Networks |
| Ray-Space Projection Model for Light Field Camera |
| Deep Geometric Prior for Surface Reconstruction |
| Analysis of Feature Visibility in Non-Line-Of-Sight Measurements |
| Hyperspectral Imaging With Random Printed Mask |
| All-Weather Deep Outdoor Lighting Estimation |
| A Variational EM Framework With Adaptive Edge Selection for Blind Motion Deblurring |
| Viewport Proposal CNN for 360deg Video Quality Assessment |
| Beyond Gradient Descent for Regularized Segmentation Losses |
| MAGSAC: Marginalizing Sample Consensus |
| Understanding and Visualizing Deep Visual Saliency Models |
| Divergence Prior and Vessel-Tree Reconstruction |
| Unsupervised Domain-Specific Deblurring via Disentangled Representations |
| Douglas-Rachford Networks: Learning Both the Image Prior and Data Fidelity Terms for Blind Image Deconvolution |
| Speed Invariant Time Surface for Learning to Detect Corner Points With Event-Based Cameras |
| Training Deep Learning Based Image Denoisers From Undersampled Measurements Without Ground Truth and Without Image Prior |
| A Variational Pan-Sharpening With Local Gradient Constraints |
| F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning |
| Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation |
| Graph Attention Convolution for Point Cloud Semantic Segmentation |
| Normalized Diversification |
| Learning to Localize Through Compressed Binary Maps |
| A Parametric Top-View Representation of Complex Road Scenes |
| Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction |
| Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids |
| Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network |
| Self-Supervised Representation Learning by Rotation Feature Decoupling |
| Weakly Supervised Deep Image Hashing Through Tag Embeddings |
| Improved Road Connectivity by Joint Learning of Orientation and Segmentation |
| Deep Supervised Cross-Modal Retrieval |
| A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning |
| Data Representation and Learning With Graph Diffusion-Embedding Networks |
| Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph |
| Image-Question-Answer Synergistic Network for Visual Dialog |
| Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses |
| Inverse Cooking: Recipe Generation From Food Images |
| Adversarial Semantic Alignment for Improved Image Captions |
| Answer Them All! Toward Universal Visual Question Answering Models |
| Unsupervised Multi-Modal Neural Machine Translation |
| Multi-Task Learning of Hierarchical Vision-Language Representation |
| Cross-Modal Self-Attention Network for Referring Image Segmentation |
| DuDoNet: Dual Domain Network for CT Metal Artifact Reduction |
| Fast Spatio-Temporal Residual Network for Video Super-Resolution |
| Complete the Look: Scene-Based Complementary Product Recommendation |
| Selective Sensor Fusion for Neural Visual-Inertial Odometry |
| Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes |
| Learning Binary Code for Personalized Fashion Recommendation |
| Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model |
| Privacy Protection in Street-View Panoramas Using Depth and Multi-View Imagery |
| Grounding Human-To-Vehicle Advice for Self-Driving Vehicles |
| Multi-Step Prediction of Occupancy Grid Maps With Recurrent Neural Networks |
| Connecting Touch and Vision via Cross-Modal Prediction |
| X2CT-GAN: Reconstructing CT From Biplanar X-Rays With Generative Adversarial Networks |
| Practical Full Resolution Learned Lossless Image Compression |
| Image-To-Image Translation via Group-Wise Deep Whitening-And-Coloring Transformation |
| Max-Sliced Wasserstein Distance and Its Use for GANs |
| Meta-Learning With Differentiable Convex Optimization |
| RePr: Improved Training of Convolutional Filters |
| Tangent-Normal Adversarial Regularization for Semi-Supervised Learning |
| Auto-Encoding Scene Graphs for Image Captioning |
| Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech |
| Attention Branch Network: Learning of Attention Mechanism for Visual Explanation |
| Cascaded Projection: End-To-End Network Compression and Acceleration |
| DeepCaps: Going Deeper With Capsule Networks |
| FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search |
| APDrawingGAN: Generating Artistic Portrait Drawings From Face Photos With Hierarchical GANs |
| Constrained Generative Adversarial Networks for Interactive Image Generation |
| WarpGAN: Automatic Caricature Generation |
| Explainability Methods for Graph Convolutional Neural Networks |
| A Generative Adversarial Density Estimator |
| SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates |
| High-Quality Face Capture Using Anatomical Muscles |
| FML: Face Model Learning From Videos |
| AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations |
| 3D Hand Shape and Pose Estimation From a Single RGB Image |
| 3D Hand Shape and Pose From Images in the Wild |
| Self-Supervised 3D Hand Pose Estimation Through Training by Fitting |
| CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark |
| Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in a Triadic Interaction |
| HoloPose: Holistic 3D Human Reconstruction In-The-Wild |
| Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation |
| In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations |
| Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues |
| Self-Supervised Representation Learning From Videos for Facial Action Unit Detection |
| Combining 3D Morphable Models: A Large Scale Face-And-Head Model |
| Boosting Local Shape Matching for Dense 3D Face Correspondence |
| Unsupervised Part-Based Disentangling of Object Shape and Appearance |
| Monocular Total Capture: Posing Face, Body, and Hands in the Wild |
| Expressive Body Capture: 3D Hands, Face, and Body From a Single Image |
| Neural RGB®D Sensing: Depth and Uncertainty From a Video Camera |
| DAVANet: Stereo Deblurring With View Aggregation |
| DVC: An End-To-End Deep Video Compression Framework |
| SOSNet: Second Order Similarity Regularization for Local Descriptor Learning |
| “Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors |
| Unprocessing Images for Learned Raw Denoising |
| Residual Networks for Light Field Image Super-Resolution |
| Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers |
| Second-Order Attention Network for Single Image Super-Resolution |
| Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations |
| Path-Invariant Map Networks |
| FilterReg: Robust and Efficient Probabilistic Point-Set Registration Using Gaussian Filter and Twist Parameterization |
| Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope |
| Lifting Vectorial Variational Problems: A Natural Formulation Based on Geometric Measure Theory and Discrete Exterior Calculus |
| A Sufficient Condition for Convergences of Adam and RMSProp |
| Guaranteed Matrix Completion Under Multiple Linear Transformations |
| MAP Inference via Block-Coordinate Frank-Wolfe Algorithm |
| A Convex Relaxation for Multi-Graph Matching |
| Pixel-Adaptive Convolutional Neural Networks |
| Single-Frame Regularization for Temporally Stable CNNs |
| An End-To-End Network for Generating Social Relationship Graphs |
| Meta-Learning Convolutional Neural Architectures for Multi-Target Concrete Defect Classification With the COncrete DEfect BRidge IMage Dataset |
| ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model |
| SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization |
| Defending Against Adversarial Attacks by Randomized Diversification |
| Rob-GAN: Generator, Discriminator, and Adversarial Attacker |
| Learning From Noisy Labels by Regularized Estimation of Annotator Confusion |
| Task-Free Continual Learning |
| Importance Estimation for Neural Network Pruning |
| Detecting Overfitting of Deep Generative Networks via Latent Recovery |
| Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks |
| Characterizing and Avoiding Negative Transfer |
| Building Efficient Deep Neural Networks With Unitary Group Convolutions |
| Semi-Supervised Learning With Graph Learning-Convolutional Networks |
| Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning |
| AIRD: Adversarial Learning Framework for Image Repurposing Detection |
| A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations |
| Trust Region Based Adversarial Attack on Neural Networks |
| PEPSI : Fast Image Inpainting With Parallel Decoding Network |
| Model-Blind Video Denoising via Frame-To-Frame Training |
| End-To-End Efficient Representation Learning via Cascading Combinatorial Optimization |
| Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation |
| ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation |
| Regularizing Activation Distribution for Training Binarized Deep Networks |
| Robustness Verification of Classification Deep Neural Networks via Linear Programming |
| Additive Adversarial Learning for Unbiased Authentication |
| Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation |
| Adversarial Defense by Stratified Convolutional Sparse Coding |
| Exploring Object Relation in Mean Teacher for Cross-Domain Detection |
| Hierarchical Disentanglement of Discriminative Latent Features for Zero-Shot Learning |
| R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network |
| Rethinking Knowledge Graph Propagation for Zero-Shot Learning |
| Learning to Learn Image Classifiers With Visual Analogy |
| Where’s Wally Now? Deep Generative and Discriminative Embeddings for Novelty Detection |
| Weakly Supervised Image Classification Through Noise Regularization |
| Data-Driven Neuron Allocation for Scale Aggregation Networks |
| Graphical Contrastive Losses for Scene Graph Parsing |
| Deep Transfer Learning for Multiple Class Novelty Detection |
| QATM: Quality-Aware Template Matching for Deep Learning |
| Retrieval-Augmented Convolutional Neural Networks Against Adversarial Examples |
| Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images |
| FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network |
| Weakly Supervised Video Moment Retrieval From Text Queries |
| Content-Aware Multi-Level Guidance for Interactive Instance Segmentation |
| Greedy Structure Learning of Hierarchical Compositional Models |
| Interactive Full Image Segmentation by Considering All Regions Jointly |
| Learning Active Contour Models for Medical Image Segmentation |
| Customizable Architecture Search for Semantic Segmentation |
| Local Features and Visual Words Emerge in Activations |
| Hyperspectral Image Super-Resolution With Optimized RGB Guidance |
| Adaptive Confidence Smoothing for Generalized Zero-Shot Learning |
| PMS-Net: Robust Haze Removal Based on Patch Map for Single Images |
| Deep Spherical Quantization for Image Search |
| Large-Scale Interactive Object Segmentation With Human Annotators |
| A Poisson-Gaussian Denoising Dataset With Real Fluorescence Microscopy Images |
| Task Agnostic Meta-Learning for Few-Shot Learning |
| Progressive Ensemble Networks for Zero-Shot Recognition |
| Direct Object Recognition Without Line-Of-Sight Using Optical Coherence |
| Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning |
| Perturbation Analysis of the 8-Point Algorithm: A Case Study for Wide FoV Cameras |
| Robustness of 3D Deep Learning in an Adversarial Setting |
| SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations |
| StereoDRNet: Dilated Residual StereoNet |
| The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation |
| Learning Joint Reconstruction of Hands and Manipulated Objects |
| Deep Single Image Camera Calibration With Radial Distortion |
| CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth |
| Translate-to-Recognize Networks for RGB-D Scene Recognition |
| Re-Identification Supervised Texture Generation |
| Action4D: Online Action Recognition in the Crowd and Clutter |
| Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction |
| Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks |
| Noise-Tolerant Paradigm for Training Face Recognition CNNs |
| Low-Rank Laplacian-Uniform Mixed Model for Robust Face Recognition |
| Generalizing Eye Tracking With Bayesian Adversarial Learning |
| Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection |
| Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer |
| Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis |
| AdaptiveFace: Adaptive Margin and Sampling for Face Recognition |
| Disentangled Representation Learning for 3D Face Shape |
| LBS Autoencoder: Self-Supervised Fitting of Articulated Meshes to Point Clouds |
| PifPaf: Composite Fields for Human Pose Estimation |
| TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection |
| Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos |
| Local Temporal Bilinear Pooling for Fine-Grained Action Parsing |
| Improving Action Localization by Progressive Cross-Stream Cooperation |
| Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition |
| A Neural Network Based on SPD Manifold Learning for Skeleton-Based Hand Gesture Recognition |
| Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition |
| Learning Spatio-Temporal Representation With Local and Global Diffusion |
| Unsupervised Learning of Action Classes With Continuous Temporal Embedding |
| Double Nuclear Norm Based Low Rank Representation on Grassmann Manifolds for Clustering |
| SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction |
| Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes |
| An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM |
| A Neural Temporal Model for Human Motion Prediction |
| Multi-Agent Tensor Fusion for Contextual Trajectory Prediction |
| Coordinate-Based Texture Inpainting for Pose-Guided Human Image Generation |
| On Stabilizing Generative Adversarial Training With Noise |
| Self-Supervised GANs via Auxiliary Rotation Loss |
| Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture |
| Object-Driven Text-To-Image Synthesis via Adversarial Training |
| Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination |
| Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions |
| Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager |
| Quasi-Unsupervised Color Constancy |
| Deep Defocus Map Estimation Using Domain Adaptation |
| Using Unknown Occluders to Recover Hidden Scenes |
| Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation |
| Learning Parallax Attention for Stereo Image Super-Resolution |
| Knowing When to Stop: Evaluation and Verification of Conformity to Output-Size Specifications |
| Spatial Attentive Single-Image Deraining With a High Quality Real Rain Dataset |
| Focus Is All You Need: Loss Functions for Event-Based Vision |
| Scalable Convolutional Neural Network for Image Compressed Sensing |
| Event Cameras, Contrast Maximization and Reward Functions: An Analysis |
| Convolutional Neural Networks Can Be Deceived by Visual Illusions |
| PDE Acceleration for Active Contours |
| Dichromatic Model Based Temporal Color Constancy for AC Light Sources |
| Semantic Attribute Matching Networks |
| Skin-Based Identification From Multispectral Image Data Using CNNs |
| Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks |
| Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments |
| PIEs: Pose Invariant Embeddings |
| Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning |
| Object Counting and Instance Segmentation With Image-Level Supervision |
| Variational Autoencoders Pursue PCA Directions (by Accident) |
| A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes |
| Temporal Transformer Networks: Joint Learning of Invariant and Discriminative Time Warping |
| PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval |
| Depth Coefficients for Depth Completion |
| Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection |
| Good News, Everyone! Context Driven Entity-Aware Captioning for News Images |
| Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding |
| Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning |
| Pointing Novel Objects in Image Captioning |
| Informative Object Annotations: Tell Me Something I Don’t Know |
| Engaging Image Captioning via Personality |
| Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention |
| TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments |
| A Simple Baseline for Audio-Visual Scene-Aware Dialog |
| End-To-End Learned Random Walker for Seeded Image Segmentation |
| Efficient Neural Network Compression |
| Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms |
| C3AE: Exploring the Limits of Compact Model for Age Estimation |
| Adaptive Weighting Multi-Field-Of-View CNN for Semantic Segmentation in Pathology |
| In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images |
| Context-Aware Visual Compatibility Prediction |
| Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks |
| Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation |
| Context-Aware Spatio-Recurrent Curvilinear Structure Segmentation |
| An Alternative Deep Feature Approach to Line Level Keyword Spotting |
| Dynamics Are Important for the Recognition of Equine Pain in Video |
| LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving |
| Machine Vision Guided 3D Medical Image Compression for Efficient Transmission and Accurate Segmentation in the Clouds |
| PointPillars: Fast Encoders for Object Detection From Point Clouds |
| Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views |
| From Coarse to Fine: Robust Hierarchical Localization at Large Scale |
| Large Scale High-Resolution Land Cover Mapping With Multi-Resolution Data |
| Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting |