Recent advancements in deep learning have shown impressive results in image and video denoising, leveraging extensive pairs of noisy and noise-free data for supervision. However, the challenge of acquiring paired videos for dynamic scenes hampers the practical deployment of deep video denoising techniques. In contrast, this obstacle is less pronounced in image denoising, where paired data is more readily available. In this paper, we propose a novel unsupervised video denoising framework, named ``\textbf{T}emporal \textbf{A}s a \textbf{P}lugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser. By incorporating the plug-and-play strategy, our TAP model can harness temporal information across noisy frames, complementing its power of spatial denoising. Furthermore, we introduce a progressive fine-tuning strategy that refines each temporal module using the generated \textit{pseudo-clean} video frames, progressively enhancing the network's denoising performance. Compared to other unsupervised video denoising methods, our framework demonstrates superior performance on both sRGB and raw video denoising datasets.
Live content is unavailable. Log in and register to view live content