Skip to yearly menu bar Skip to main content


Show Detail Timezone:
America/Los_Angeles
 
Filter Rooms:  

SAT 28 SEP
11 p.m.
(ends 9:00 AM)

SUN 29 SEP
midnight
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Tutorial:
(ends 4:00 AM)
1:30 a.m.
Break:
(ends 2:00 AM)
4 a.m.
Break:
(ends 5:00 AM)
5 a.m.
Workshop:
(ends 9:00 AM)
Tutorial:
(ends 9:00 AM)
6:30 a.m.
Break:
(ends 7:00 AM)
11 p.m.
(ends 9:00 AM)

MON 30 SEP
midnight
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
Workshop:
(ends 4:00 AM)
1:30 a.m.
Break:
(ends 2:00 AM)
4 a.m.
Break:
(ends 5:00 AM)
5 a.m.
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
Workshop:
(ends 9:00 AM)
6:30 a.m.
Break:
(ends 7:00 AM)
10 p.m.
(ends 9:30 AM)
11 p.m.

TUE 1 OCT
midnight
Orals 12:00-1:20
[12:00] Towards Scene Graph Anticipation
[12:10] OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
[12:20] PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
[12:30] Bi-directional Contextual Attention for 3D Dense Captioning
[12:40] OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
[12:50] ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
[1:00] A Fair Ranking and New Model for Panoptic Scene Graph Generation
[1:10] Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
(ends 1:30 AM)
Orals 12:00-1:20
[12:00] Making Large Language Models Better Planners with Reasoning-Decision Alignment
[12:10] MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
[12:20] M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
[12:30] H-V2X: A Large Scale Highway Dataset for BEV Perception
[12:40] Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
[12:50] DriveLM: Driving with Graph Visual Question Answering
[1:00] RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
[1:10] Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
(ends 1:30 AM)
Orals 12:00-1:20
[12:00] Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
[12:10] Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
[12:20] SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
[12:30] Photon Inhibition for Energy-Efficient Single-Photon Imaging
[12:40] Minimalist Vision with Freeform Pixels
[12:50] Flying with Photons: Rendering Novel Views of Propagating Light
[1:00] A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
[1:10] GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
(ends 1:30 AM)
1:30 a.m.
Posters 1:30-3:30
(ends 3:30 AM)
Break:
(ends 2:00 AM)
3 a.m.
Mentorship:
(ends 5:00 AM)
3:30 a.m.
Lunch:
(ends 4:30 AM)
4:30 a.m.
Orals 4:30-6:20
[4:30] EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
[4:40] TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
[4:50] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
[5:00] FlashTex: Fast Relightable Mesh Texturing with LightControlNet
[5:10] TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
[5:20] LLMGA: Multimodal Large Language Model based Generation Assistant
[5:30] Accelerating Image Generation with Sub-path Linear Approximation Model
[5:40] SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
[5:50] Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
[6:00] Zero-Shot Detection of AI-Generated Images
[6:10] Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
(ends 6:30 AM)
Orals 4:30-6:20
[4:30] Efficient Bias Mitigation Without Privileged Information
[4:40] Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
[4:50] MobileNetV4: Universal Models for the Mobile Ecosystem
[5:00] Momentum Auxiliary Network for Supervised Local Learning
[5:10] From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
[5:20] Dataset Enhancement with Instance-Level Augmentations
[5:30] Adaptive Parametric Activation
[5:40] Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
[5:50] Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
[6:00] CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
[6:10] On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
(ends 6:30 AM)
Orals 4:30-6:20
[4:30] Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
[4:40] COMO: Compact Mapping and Odometry
[4:50] Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss
[5:00] ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
[5:10] SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
[5:20] Six-Point Method for Multi-Camera Systems with Reduced Solution Space
[5:30] Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
[5:40] Grounding Image Matching in 3D with MASt3R
[5:50] ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
[6:00] Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection
[6:10] Camera Calibration using a Collimator System
(ends 6:30 AM)
6:30 a.m.
Keynote:
Lourdes Agapito · Vittorio Ferrari
(ends 7:30 AM)
7:30 a.m.
Break:
(ends 8:00 AM)
Posters 7:30-9:30
(ends 9:30 AM)
9:30 a.m.
Reception:
(ends 10:30 AM)
11 p.m.
(ends 9:30 AM)

WED 2 OCT
midnight
Orals 12:00-1:20
[12:00] PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
[12:10] UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
[12:20] Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
[12:30] Parrot Captions Teach CLIP to Spot Text
[12:40] Towards Open-ended Visual Quality Comparison
[12:50] VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
[1:00] Insect Identification in the Wild: The AMI Dataset
[1:10] MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
(ends 1:30 AM)
Orals 12:00-1:20
[12:00] PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
[12:10] Self-Supervised Video Desmoking for Laparoscopic Surgery
[12:20] CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
[12:30] Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
[12:40] Adaptive Correspondence Scoring for Unsupervised Medical Image Registration
[12:50] Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View
[1:00] SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
[1:10] Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
(ends 1:30 AM)
Orals 12:00-1:20
[12:00] HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
[12:10] PointLLM: Empowering Large Language Models to Understand Point Clouds
[12:20] RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
[12:30] DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
[12:40] KeypointDETR: An End-to-End 3D Keypoint Detector
[12:50] Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
[1:00] RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
[1:10] Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration
(ends 1:30 AM)
1:30 a.m.
Break:
(ends 2:00 AM)
Posters 1:30-3:30