Video Understanding
← Back to Computer Vision
Processing and understanding video data. Requires temporal modeling in addition to spatial understanding. Tasks: action recognition, video captioning, temporal localization, video summarization.
Related
- CNN Architectures (spatial features)
- Transformers (temporal attention)