Temporal Segment Networks for Action Recognition in Videos
release_uv2haezgobfbrffgro4falihze
by
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang,
Luc Van Gool
2017
Abstract
Deep convolutional networks have achieved great success for image
recognition. However, for action recognition in videos, their advantage over
traditional methods is not so evident. We present a general and flexible
video-level framework for learning action models in videos. This method, called
temporal segment network (TSN), aims to model long-range temporal structures
with a new segment-based sampling and aggregation module. This unique design
enables our TSN to efficiently learn action models by using the whole action
videos. The learned models could be easily adapted for action recognition in
both trimmed and untrimmed videos with simple average pooling and multi-scale
temporal window integration, respectively. We also study a series of good
practices for the instantiation of TSN framework given limited training
samples. Our approach obtains the state-the-of-art performance on four
challenging action recognition benchmarks: HMDB51 (71.0%), UCF101 (94.9%),
THUMOS14 (80.1%), and ActivityNet v1.2 (89.6%). Using the proposed RGB
difference for motion models, our method can still achieve competitive accuracy
on UCF101 (91.0%) while running at 340 FPS. Furthermore, based on the temporal
segment networks, we won the video classification track at the ActivityNet
challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and
the proposed good practices.
In text/plain
format
Archived Files and Locations
application/pdf 2.5 MB
file_eocjzuugwjdx5im2rw67ao2q3a
|
arxiv.org (repository) web.archive.org (webarchive) |
1705.02953v1
access all versions, variants, and formats of this works (eg, pre-prints)