Delft University of Technology
EDL P16-24 P3: DL for Video Analysis and Surveillance (VAS)
Deep Learning for long term video understanding
The PhD project “Deep Learning for long term video understanding” addresses the challenge of automatically analysing the content of videos involving human interactions, in order to extract relevant information and describe the content of the videos. As in the last years Deep Learning proved its capability to successfully extract discriminative visual features and use them to accomplish a variety of tasks, from image classification, image captioning, semantic segmentation, such a data driven approach is employed for videos, too. The main difference between image and video analysis is that videos vary both in the space and the time dimensions. The presence of a third dimension makes the computations harder. However, in order to model motion, time needs to be taken into consideration.
Because of the intrinsic complexity of the task, primary research on video understanding focused on fine-grained action detection and classification. In this approach, the Deep Learning models used for the classification receive as input short video clips with duration of a few seconds, that contain one single action each. The aim of this project, in line with the current studies on video understanding, addresses a more semantically complex task, namely the analysis of videos of long duration, in order to predict what happens in the videos or what will happen in the future. In this case, modelling and storing long term information is necessary to detect the causal relationships between consecutive actions and use them to make predictions.
Long-term Behaviour Recognition in Videos with Actor-focused Region Attention.
Luca Ballan, Ombretta Strafforello, Klamer Schutte.
Conference proceedings of VISAPP 2021, Open access-gold