Fida Mohammad Thoker

Universiteit van Amsterdam

EDL P16-25 P2: Deep Learning for Video Analysis and Surveillance

Research assignment:
Data-efficient Action Recognition

Modern deep learning architectures rely heavily on the ability to collect and annotate large scale action video datasets. While collecting and annotating large action datasets is costly, at the same time for some actions it is not practically feasible to collect many examples due to the rarity of the events involved (e.g suspicious activities at an airport). During my Ph.D. research, I aim to take a step towards data-efficient action recognition by developing new methods for transfer learning, knowledge transfer, self-supervised learning, etc  involving multiple video modalities like RGB data, Depth maps and 3D-Skeleton sequences. So far I have addressed this problem using following technique:
Cross-modal feature supervision: Extract knowledge from pre-trained off-the-shelf RGB models to train new non-RGB action models using unlabeled paired modalities, thus requiring limited labeled examples for new non-RGB actions.

Feature-Supervised Action Modality Transfer.
Fida Mohammad Thoker, Cees Snoek. Conference proceedings of IEEE International Conference on Pattern Recognition (ICPR), 2020. Open access-gold.

Skeleton-Contrastive 3D Action Representation Learning.
Fida Mohammad Thoker, Hazel Doughty, Cees G.M. Snoek. Conference proceedings of the ACM Multimedia 2021. Open access-green.

Personal information: