Vision & Generation

Pose Estimation

Detecting the position of human (or animal) joints in images and video to reconstruct skeletal pose.

In common use since 2014

Pose estimation is the computer vision task of detecting the positions of human (or animal) joints in images and video, producing a skeleton of keypoints that captures body posture. It powers fitness apps, sports analytics, motion capture for animation, AR effects, ergonomic analysis and increasingly the ControlNet-driven pipelines that condition image generation on pose.

The variants in 2026:

  • 2D pose estimation — joints in pixel coordinates; works on any single image. Mature.
  • 3D pose estimation — joints in 3D world coordinates; harder, often using monocular or multi-view input.
  • Whole-body pose — body, hands, face all together for full character capture.
  • Multi-person pose — every person in a crowd estimated simultaneously.
  • Animal pose — for veterinary, agriculture, wildlife and animation use cases.

The model landscape:

  • OpenPose — historic foundational model; widely used and integrated.
  • MediaPipe (Google) — fast on-device pose estimation; popular for mobile and web AR.
  • YOLOv8 / YOLO-Pose — fast, accurate, easy to deploy for production at scale.
  • MMPose — research-grade toolkit with state-of-the-art models.
  • DWPose, RTMPose — newer high-accuracy options now used in many production pipelines.
  • Move AI, Wonder Studio — commercial markerless mocap from video; turn ordinary footage into animation-ready pose data.

Production use cases that ship in 2026:

  • Fitness apps — count reps, check form, give corrections (Tonal, Mirror, Apple Fitness+).
  • Sports analytics — player tracking, motion analysis, performance metrics; widely used in MLB, NBA and soccer.
  • AR effects — body filters, virtual try-on, dance challenges on TikTok and Instagram.
  • Motion capture for animation — markerless mocap from a phone for indie game developers and animators.
  • Healthcare and rehabilitation — gait analysis, physical therapy assessment, fall detection for elderly care.
  • Workplace ergonomics — automated assessment of repetitive-strain risk in industrial environments.
  • Image generation control — ControlNet OpenPose conditioning to specify exact character poses for generated images.

The hard cases:

  • Occlusion — when one body part hides another, accuracy drops.
  • Multiple overlapping people — crowd scenes are harder than single-person.
  • Unusual poses — yoga, gymnastics, martial arts can confuse models trained on common poses.
  • Loose clothing — heavy clothing hides body landmarks.
  • Viewpoint extremes — overhead or extreme low angles see fewer training examples.

For a US team in 2026, pose estimation has commoditised dramatically. MediaPipe runs in real time in a browser; YOLO-Pose deploys easily on edge devices; cloud APIs cover the high-end use cases. The interesting work has moved up the stack — what to do with the pose data once you have it — rather than into the pose estimation itself. Custom training is rarely needed; foundation pose models work well across most domains out of the box.

Keep exploring

Looking for something else? The full glossary covers 120+ AI terms updated for 2026.

Open the glossary
Chat on WhatsApp