Point-supervised temporal action localization pursues high-accuracy action detection under low-cost data annotation. Despite recent advances, a significant challenge remains: sparse labeling of individual frames leads to semantic ambiguity in determining action boundaries due to the lack of continuity in the highly sparse point-supervision scheme. We propose a Stepwise Multi-grained Boundary Detector(SMBD), which is comprised of a Background Anchor Generator(BAG) and a Dual Boundary Detector(DBD) to provide fine-grained supervision. Specifically, for each epoch in the training process, BAG computes the optimal background snippet between each pair of adjacent action labels, which we term Background Anchor. Subsequently, DBD leverages the background anchor and the action labels to locate the action boundaries from the perspectives of detecting action changes and scene changes. Then, the corresponding labels can be assigned to each side of the boundaries, with the boundaries continuously updated throughout the training process. Consequently, the proposed SMBD could ensure that more snippets contribute to the training process. Extensive experiments on the THUMOS’14, GTEA and BEOID datasets demonstrate that the proposed method outperforms existing state-of-the-art methods.
Live content is unavailable. Log in and register to view live content