Vision for Social Robots: Human Perception and Pose Estimation

Read PDF →

, 2018

Category: Computer Vision

Overall Rating

2.7/5 (19/35 pts)

Score Breakdown

  • Latent Novelty Potential: 6/10
  • Cross Disciplinary Applicability: 6/10
  • Technical Timeliness: 4/10
  • Obscurity Advantage: 3/5

Synthesized Summary

  • This paper highlights specific, under-discussed perceptual challenges (like how inherent facial structure biases distance estimates or what minimal depth cues are sufficient for 3D pose) and empirically validates characteristics of human action data.

  • While the methods themselves are largely obsolete, these specific identified problems and empirical findings could serve as inspiration and validation targets for developing novel, more robust implicit or self-supervised learning objectives in modern AI systems aiming for nuanced human perception.

Optimist's View

  • The central observation that individual physiognomy significantly biases distance estimation... is a profound insight.

  • Modern implicit neural representations... could involve training an implicit model... to learn a disentangled latent space representing both 3D facial identity and camera parameters...

  • The ability to discover interpretable, 3D, rotation-invariant components of motion (movemes) solely from static 2D images is a powerful form of geometric-aware unsupervised learning.

  • The key finding is that surprisingly little supervision – sparse relative depth orderings... – is sufficient to train a 3D pose estimator to competitive performance.

Skeptic's View

  • The core assumptions and methods feel dated.

  • ...its specific technical implementations proved either too brittle, too narrow in scope, or less powerful than parallel or immediately subsequent research directions.

  • Chapter 4's reliance on a predefined verb list... struggles with the long tail of real-world actions and doesn't easily integrate with modern end-to-end deep learning pipelines.

  • Attempting to directly apply these specific perceptual techniques... to embodied social robot control... would be misguided.

Final Takeaway / Relevance

Watch