Pose estimation generally refers to inferring the orientation and position of something. In computer vision, it often means human pose estimation, which is the task of detecting the positions of human body keypoints (joints) in images or videos. But it can also refer to determining the pose of objects in 3D (e.g., estimating 3D rotation/translation of a known object from an image) or camera pose estimation (finding camera position relative to a scene). For human pose, there are two types: 2D pose (keypoints in image coordinates) and 3D pose (keypoints in world or camera 3D coordinates). Approaches involve CNNs that either regress keypoint coordinates or produce heatmaps for each keypoint (where brightness indicates probability of joint presence) and then decode those to positions. Pose estimation has many uses: activity recognition, animation (motion capture), sports analysis, AR (putting digital effects on people). For objects, 6-DoF pose estimation is crucial in robotics (to pick up an object, you need to know its 3D pose). So “pose estimation” should be interpreted in context: most often human joints, but not always.
Data Selection & Data Viewer
Get data insights and find the perfect selection strategy
Learn MoreSelf-Supervised Pretraining
Leverage self-supervised learning to pretrain models
Learn MoreSmart Data Capturing on Device
Find only the most valuable data directly on devide
Learn More