Predicting vectorized high-definition (HD) map online is useful for autonomous driving, providing detailed geometric and semantic information on the surrounding road environment. In this paper, we introduce Mask2Map, a novel end-to-end online HD map construction method. Our approach identifies semantic components within a scene represented in the bird's eye view (BEV) domain and then generates a precise vectorized map topology based on this information. Mask2Map comprises two main components: an Instance-level Mask Prediction Network (IMPNet) and a Mask-Driven Map Prediction Network (MMPNet). IMPNet generates a mask-aware query capable of producing BEV segmentation masks, while MMPNet accurately constructs vectorized map components, leveraging the semantic geometric information provided by the mask-aware query. For enhancing HD map predictions, we design innovative modules for MMPNet based on outputs from IMPNet. We present a Positional Feature Generator that generates instance-level positional features by utilizing the comprehensive spatial context from semantic components of instance. We also propose a Geometric Feature Extractor which extracts point-level geometric features using sparse key points pooled from the segmentation masks. Furthermore, we present the denoising training strategy for inter-network consistency to boost the performance of map construction. Our evaluation conducted on nuScenes and Argoverse2 benchmarks demonstrates that our Mask2Map achieves a remarkable performance improvement over previous state-of-the-art methods by 10.1 mAP and 4.1 mAP. The code will be available soon.
Live content is unavailable. Log in and register to view live content