Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
2011-07-07 17:56
537 查看
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
2Universidad del Norte, Barranquilla, Colombia
We present a novel unsupervised learning method for learning human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm learns the probability distributions of the spatial-temporal words and intermediate topics corresponding to human action categories automatically using a probabilistic Latent Semantic Analysis (pLSA) model. The learned model is then used for human action categorization and localization in a novel video, by maximizing the posterior of action category (topic) distributions. The contributions of this work are as follows:
Unsupervised learning of actions using 'video words' representation: We deploy a pLSA model with 'bag of video words' representation for video analysis;
Multiple action localization and categorization: Our approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence.
Full Text: PDF
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories, in Video Proceedings, International Conference on Computer Vision and Pattern Recognition (VPCVPR), New York, 2006.
Full Text: PDF (One Page)
Video Demo: AVI
There is also a poster about this work, presented at IMA Workshop: Visual Learning and Recognition, Minneapolis, 2006.
Author site: http://vision.ucsd.edu/~pdollar/research/research.html
Juan Carlos Niebles1,2, Hongcheng Wang1, Li Fei-Fei1
1University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA2Universidad del Norte, Barranquilla, Colombia
Summary
To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, object-level video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects.We present a novel unsupervised learning method for learning human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm learns the probability distributions of the spatial-temporal words and intermediate topics corresponding to human action categories automatically using a probabilistic Latent Semantic Analysis (pLSA) model. The learned model is then used for human action categorization and localization in a novel video, by maximizing the posterior of action category (topic) distributions. The contributions of this work are as follows:
Unsupervised learning of actions using 'video words' representation: We deploy a pLSA model with 'bag of video words' representation for video analysis;
Multiple action localization and categorization: Our approach is not only able to classify different actions, but also to localize different actions simultaneously in a novel and complex video sequence.
Our Algorithm
Resources
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, Accepted for Oral Presentation At British Machine Vision Conference (BMVC), Edinburgh, 2006.Full Text: PDF
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories, in Video Proceedings, International Conference on Computer Vision and Pattern Recognition (VPCVPR), New York, 2006.
Full Text: PDF (One Page)
Video Demo: AVI
There is also a poster about this work, presented at IMA Workshop: Visual Learning and Recognition, Minneapolis, 2006.
Selected References
Piotr Dollár, Vincent Rabaud, Garrison Cottrell and Serge Belongie. Behavior Recognition via Sparse Spatio-Temporal Features. VS-PETS 2005, Beijing, China.Author site: http://vision.ucsd.edu/~pdollar/research/research.html
相关文章推荐
- PS: Unsupervised Learning of Visual Representations Using Videos___ICCV2015
- 【CV】ICCV2015_Unsupervised Learning of Visual Representations using Videos
- 阅读小结:Unsupervised Learning of Visual Representations using Videos
- Unsupervised Learning of Visual Representations using Videos
- 【ML】ICML2015_Unsupervised Learning of Video Representations using LSTMs
- Rewiew: Unsupervised Learning of Digit Recognition Using Spike-Timing-Dependent Plasticity(IEEE)
- Unsupervised Learning and Text Mining of Emotion Terms Using R
- Unsupervised Spectral Dual Assignment Clustering of Human Action in context
- 关于论文Silhouteete-based human action recognition using sequences of key poses
- Unsupervised Learning of Edges 论文阅读
- Rails Web App Learning in action (2)--the basic version of students selective courses
- End-to-end learning of action detection from frame glimpses in videos 阅读笔记
- Human Action Recognition Using a Modified Convolutional Neural Network(经典文献阅读)
- Human Action Recognition Using a Modified Convolutional Neural Network(经典文献阅读)
- [Stereo_unsupervised][cvpr16]Unsupervised learning of disparity maps from stereo images
- 【Paper】CVPR13_Learning video saliency from human gaze using candidate selection
- example of using ActionFunction in Salesforce
- The NOTE of learning ASP.NET [19] 关于GC(内存回收机制)、对象的销毁和using的使用
- 论文 NaturalScene Statistics Account for the Representation of Scene Categoriesin Human Visual Cortex
- Human detection using oriented histograms of flow and appearance中的实验方法