Fida Mohammad Thoker
Universiteit van Amsterdam
EDL P16-25 P2: Deep Learning for Video Analysis and Surveillance
Data-efficient Action Recognition
deep learning architectures rely heavily on the ability to collect and
annotate large scale action video datasets. While collecting and annotating large action
datasets is costly, at the same time for some actions it is not practically
feasible to collect many examples due to the rarity of the events involved
(e.g suspicious activities at an airport). During my Ph.D. research, I aim to
take a step towards data-efficient action recognition by developing new
methods for transfer learning, knowledge transfer, self-supervised learning,
etc involving multiple video
modalities like RGB data, Depth maps and 3D-Skeleton sequences. So far I have
addressed this problem using following technique:
–Cross-modal feature supervision: Extract knowledge from pre-trained off-the-shelf RGB models to train new non-RGB action models using unlabeled paired modalities, thus requiring limited labeled examples for new non-RGB actions.
Feature-Supervised Action Modality Transfer.
Fida Mohammad Thoker, Cees Snoek. Conference proceedings of IEEE International Conference on Pattern Recognition (ICPR), 2020. Open access-gold.
Skeleton-Contrastive 3D Action Representation Learning.
Fida Mohammad Thoker, Hazel Doughty, Cees G.M. Snoek. Conference proceedings of the ACM Multimedia 2021. Open access-green.