Fida Mohammad Thoker

Universiteit van Amsterdam
VIS Lab


EDL P16-25 P2: Deep Learning for Video Analysis and Surveillance


Research assignment:
Data-efficient Action Recognition

Modern deep learning architectures rely heavily on the ability to collect and annotate large scale action video datasets. While collecting and annotating large action datasets is costly, at the same time for some actions it is not practically feasible to collect many examples due to the rarity of the events involved (e.g suspicious activities at an airport). During my Ph.D. research, I aim to take a step towards data-efficient action recognition by developing new methods for transfer learning, knowledge transfer, self-supervised learning, etc  involving multiple video modalities like RGB data, Depth maps and 3D-Skeleton sequences. So far I have addressed this problem using following technique:
Cross-modal feature supervision: Extract knowledge from pre-trained off-the-shelf RGB models to train new non-RGB action models using unlabeled paired modalities, thus requiring limited labeled examples for new non-RGB actions.

Publications
 
Feature-Supervised Action Modality Transfer.
Fida Mohammad Thoker, Cees Snoek. Conference proceedings of IEEE International Conference on Pattern Recognition (ICPR), 2020. Open access-gold.
https://drive.google.com/open?id=1VfyMizsZFwcqTknPHym9dshyyzsUs0w0

Skeleton-Contrastive 3D Action Representation Learning.
Fida Mohammad Thoker, Hazel Doughty, Cees G.M. Snoek. Conference proceedings of the ACM Multimedia 2021. Open access-green.
https://drive.google.com/open?id=1eOAETs-ob86uNWP2_O11mtl4ENyzCh23
https://dl.acm.org/doi/pdf/10.1145/3474085.3475307

Personal information:

Homepage
LinkedIn
ResearchGate
GoogleScholar