Abstract:
Action detection in video streams requires accurately identifying the action category at the current moment within an online video stream, given only the historical and current spatiotemporal information observed up to that point. The existing methods mainly conduct supervised learning by designing networks and using frame-level information, which are overly sensitive to single-frame information and lack temporal consistency, resulting in insufficient detection accuracy. To address the aforementioned issues, an instance-guided video stream action detection method is proposed. Building upon frame-level detection, instance information is augmented, an instance graph reasoning strategy is proposed for generating guidance, and temporal features are then integrated to enhance detection performance. Finally, the proposed algorithm is validated on publicly available video datasets, and experimental results demonstrate the effectiveness of the method and its high detection efficiency.