Abstract:
Video frames captured by rolling shutter (RS) cameras during fast camera movement frequently exhibit RS distortion and blur simultaneously. These RS frames can be modeled as a row-wise combination of global shutter (GS) frames within the exposure period. Naturally, recovering high-frame-rate GS sharp frames from an RS blur image must simultaneously consider RS correction, deblur, and frame interpolation. A naive way is to decompose the whole process into separate tasks and cascade existing methods; however, this results in cumulative errors and noticeable artifacts. Event cameras enjoy many advantages, \eg, high temporal resolution, making them potential for our problem. To this end, we propose the \textbf{first} and novel approach, named \textbf{UniINR}, to recover arbitrary frame-rate sharp GS frames from an RS blur image and paired event data. Our key idea is unifying spatial-temporal implicit neural representation (INR) to directly map the position and time coordinates to RGB values to address the interlocking degradations in the image restoration process. Specifically, we introduce spatial-temporal implicit encoding (STE) to convert an RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), we embed the exposure time into STR and decode the embedded features pixel-by-pixel to recover a sharp frame. Our method features a lightweight model with only \textbf{$0.379 M$} parameters, and it also enjoys high inference efficiency, achieving $2.83 ms/frame$ in $31 \times$ frame interpolation of an RS blur frame. Extensive experiments show that our method significantly outperforms prior methods.
Live content is unavailable. Log in and register to view live content