Learning Representational Invariances for Data-Efficient Action Recognition release_2kz2f6yc3jb43gpdfg6ao7jo6a

by Yuliang Zou, Jinwoo Choi, Qitong Wang, Jia-Bin Huang

Released as a article .

2022  

Abstract

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101, and HMDB-51 datasets in the low-label regime. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
In text/plain format

Archived Files and Locations

application/pdf  1.5 MB
file_nruovhcp2fd7vgw64qfsqzckbi
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-02-14
Version   v2
Language   en ?
arXiv  2103.16565v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 143f72d4-421c-463e-94be-2db6ec789c8a
API URL: JSON