Abstract:
Action detection in video streams requires accurately identifying the action category at the current moment within an online video stream, given only the historical and current spatiotemporal information observed up to that point. These approaches are overly sensitive to single-frame information and lack temporal consistency, resulting in insufficient detection accuracy. Existing methods primarily design networks and use frame-level information for supervised learning, which makes them overly sensitive to single-frame information and lacks temporal consistency, resulting in insufficient detection accuracy. To address the aforementioned issues, we propose an instance-guided video stream action detection method. Building upon frame-level detection, we augment the instance information and introduce an instance graph reasoning strategy to generate guidance. Temporal features are then integrated to enhance detection performance. The proposed algorithm is validated on publicly available video datasets, and experimental results demonstrate the effectiveness of the method and its high detection efficiency.