Towards Automatic Discovery of Human Movemes

Fanti, 2008

Category: ML

Overall Rating

3.9/5 (27/35 pts)

While the paper's specific technical implementations for feature extraction and probabilistic inference are largely superseded, it offers a unique conceptual perspective by explicitly framing the discovery of motion primitives as a joint segmentation and clustering problem, drawing a direct analogy to techniques like Chinese word segmentation.
This structured probabilistic approach, if adapted using modern dense motion features and differentiable programming frameworks, presents an unconventional path to learning interpretable motion units that differs significantly from prevalent end-to-end deep learning methods

The most promising and unconventional research direction lies in the explicit analogy drawn (and the future work hinted at in Section 5.5) between motion segmentation and moveme discovery in video sequences and the problem of Chinese word segmentation in unsegmented text.
Take the explicit, probabilistic framework of joint sequence segmentation and word discovery (e.g., using dynamic programming to optimize likelihood over possible segmentations and P-HMM parameters, as hinted by the GPS99 citation for text).
Implement this structured, probabilistic model within a modern differentiable programming framework (like PyTorch or TensorFlow).
This approach is unconventional today because it re-emphasizes explicit probabilistic modeling and dynamic programming over purely data-driven end-to-end deep sequence models for this specific task.

The paper's core assumptions and input modalities are fundamentally misaligned with the modern computer vision landscape.
The multi-stage pipeline – feature detection -> probabilistic labeling/detection -> heuristic segmentation -> SLDS encoding -> clustering – is fragile.
The use of Loopy Belief Propagation (LBP) for general graphs lacks theoretical convergence guarantees, and the paper notes empirical observations of convergence issues and the need for random restarts in EM.
Modern approaches to human motion analysis have surpassed this work in almost every aspect.

Act