Robot Perception Skill
When to Use This Skill
- Setting up and configuring camera, LiDAR, or depth sensors
- Building RGB, depth, or point cloud processing pipelines
- Calibrating cameras (intrinsic, extrinsic, hand-eye)
- Implementing object detection, segmentation, or tracking for robots
- Fusing data from multiple sensor modalities
- Streaming sensor data with proper threading and buffering
- Synchronizing multi-sensor rigs
- Deploying perception models on robot hardware (GPU, edge)
- Debugging perception failures (latency, dropped frames, misalignment)
Sensor Landscape
Sensor Types and Characteristics
Sensor Type Output Range Rate Best For
─────────────────────────────────────────────────────────────────────────
RGB Camera (H,W,3) uint8 ∞ 30-120Hz Object detection, tracking, visual servoing
Stereo Camera (H,W,3)+(H,W,3) 0.3-20m 30-90Hz Dense depth from passive stereo
Structured Light (H,W) float + RGB 0.2-10m 30Hz Indoor manipulation, short range
ToF Depth (H,W) float + RGB 0.1-10m 30Hz Indoor, medium range
LiDAR (spinning) (N,3) or (N,4) 0.5-200m 10-20Hz Outdoor navigation, mapping
LiDAR (solid-st.) (N,3) 0.5-200m 10-30Hz Automotive, outdoor
IMU (6,) or (9,) N/A 200-1kHz Orientation, motion estimation
Force/Torque (6,) float N/A 1kHz+ Contact detection, force control
Tactile (H,W) or (N,3) Contact 30-100Hz Grasp quality, texture
Event Camera Events (x,y,t,p) ∞ μs High-speed tracking, HDR scenes
Common Sensor Hardware
Device Type SDK/Driver ROS2 Package
──────────────────────────────────────────────────────────────────────────
Intel RealSense Structured Light pyrealsense2 realsense2_camera
Stereolabs ZED Stereo + IMU pyzed zed_wrapper
Luxonis OAK-D Stereo + Neural depthai depthai_ros
FLIR/Basler Industrial RGB PySpin/pypylon spinnaker_camera_driver
Velodyne Spinning LiDAR velodyne_driver velodyne
Ouster Spinning LiDAR ouster-sdk ros2_ouster
Livox Solid-state LiDAR livox_sdk livox_ros2_driver
USB Webcam RGB OpenCV VideoCapture usb_cam / v4l2_camera
Camera Models and Calibration
Pinhole Camera Model
3D World Point (X, Y, Z)
|
[R | t] — Extrinsic (world → camera)
|
Camera Point (Xc, Yc, Zc)
|
K — Intrinsic (camera → pixel)
|
Pixel (u, v)
K = [ fx 0 cx ] fx, fy = focal lengths (pixels)
[ 0 fy cy ] cx, cy = principal point
[ 0 0 1 ]
Projection: [u, v, 1]^T = K @ [R | t] @ [X, Y, Z, 1]^T
Intrinsic Calibration
import cv2
import numpy as np
from pathlib import Path
class IntrinsicCalibrator:
"""Camera intrinsic calibration using checkerboard pattern"""
def __init__(self, board_size=(9, 6), square_size_m=0.025):
self.board_size = board_size
self.square_size = square_size_m
# Prepare object points (3D coordinates of checkerboard corners)
self.objp = np.zeros((board_size[0] * board_size[1], 3), np.float32)
self.objp[:, :2] = np.mgrid[
0:board_size[0], 0:board_size[1]
].T.reshape(-1, 2) * square_size_m
def collect_calibration_images(self, camera, num_images=30,
min_coverage=0.6):
"""Collect calibration images with good spatial coverage.
IMPORTANT: Move the board to cover all regions of the image,
including corners and edges. Tilt the board at various angles.
Bad coverage = bad calibration, especially at image edges.
"""
obj_points = []
img_points = []
coverage_map = np.zeros((4, 4), dtype=int) # Track board positions
while len(obj_points) < num_images:
frame = camera.capture()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
found, corners = cv2.findChessboardCorners(
gray, self.board_size,
cv2.CALIB_CB_ADAPTIVE_THRESH |
cv2.CALIB_CB_NORMALIZE_IMAGE |
cv2.CALIB_CB_FAST_CHECK
)
if found:
# Sub-pixel refinement — critical for accuracy
criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
corners = cv2.cornerSubPix(
gray, corners, (11, 11), (-1, -1), criteria)
# Track coverage
center = corners.mean(axis=0).flatten()
grid_x = int(center[0] / gray.shape[1] * 4)
grid_y = int(center[1] / gray.shape[0] * 4)
grid_x = min(grid_x, 3)
grid_y = min(grid_y, 3)
coverage_map[grid_y, grid_x] += 1
obj_points.append(self.objp)
img_points.append(corners)
coverage = (coverage_map > 0).sum() / coverage_map.size
if coverage < min_coverage:
print(f"WARNING: Only {coverage:.0%} coverage. "
f"Move board to uncovered regions.")
return obj_points, img_points, gray.shape[::-1]
def calibrate(self, obj_points, img_points, image_size):
"""Run calibration and return camera matrix + distortion coeffs"""
ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(
obj_points, img_points, image_size, None, None)
if ret > 1.0:
print(f"WARNING: High reprojection error ({ret:.3f} px). "
f"Check image quality and board detection.")
# Compute per-image reprojection errors
errors = []
for i in range(len(obj_points)):
projected, _ = cv2.projectPoints(
obj_points[i], rvecs[i], tvecs[i], K, dist)
err = cv2.norm(img_points[i], projected, cv2.NORM_L2)
err /= len(projected)
errors.append(err)
print(f"Calibration complete:")
print(f" RMS reprojection error: {ret:.4f} px")
print(f" Per-image errors: mean={np.mean(errors):.4f}, "
f"max={np.max(errors):.4f}")
print(f" Focal length: fx={K[0,0]:.1f}, fy={K[1,1]:.1f}")
print(f" Principal point: cx={K[0,2]:.1f}, cy={K[1,2]:.1f}")
return CalibrationResult(
camera_matrix=K, dist_coeffs=dist,
rms_error=ret, image_size=image_size)
def save(self, result, path):
"""Save calibration to YAML (OpenCV-compatible format)"""
fs = cv2.FileStorage(str(path), cv2.FILE_STORAGE_WRITE)
fs.write("camera_matrix", result.camera_matrix)
fs.write("dist_coeffs", result.dist_coeffs)
fs.write("image_width", result.image_size[0])
fs.write("image_height", result.image_size[1])
fs.write("rms_error", result.rms_error)
fs.release()
@staticmethod
def load(path):
"""Load calibration from YAML"""
fs = cv2.FileStorage(str(path), cv2.FILE_STORAGE_READ)
K = fs.getNode("camera_matrix").mat()
dist = fs.getNode("dist_coeffs").mat()
w = int(fs.getNode("image_width").real())
h = int(fs.getNode("image_height").real())
fs.release()
return CalibrationResult(
camera_matrix=K, dist_coeffs=dist,
image_size=(w, h), rms_error=0.0)
Extrinsic Calibration (Camera-to-Camera, Camera-to-LiDAR)
class ExtrinsicCalibrator:
"""Compute transform between two sensors using shared targets"""
def calibrate_stereo(self, calib_left, calib_right,