We explicitly condition behavior-cloning policies on camera geometry by encoding each pixel as a Plücker ray.
We explicitly condition behavior-cloning policies on camera geometry by encoding each pixel as a Plücker ray.
We explicitly condition imitation learning policies on camera geometry using per-pixel Plücker ray embeddings. Given camera intrinsics and extrinsics, each pixel is mapped to a 6D ray representation that is concatenated with image features.
To add camera conditioning to your policy, you can use the following minimalist snippet to get Plücker raymap from intrinsics and extrinsics. (It assumes OpenCV convention i.e. image origin at top-left, +z is forward.)
import torch
def get_plucker_raymap(K, c2w, height, width):
"""intrinsics (3,3), cam2world (4,4), height int, width int"""
vv, uu = torch.meshgrid(
torch.arange(height, device=K.device, dtype=K.dtype) + 0.5,
torch.arange(width, device=K.device, dtype=K.dtype) + 0.5,
indexing="ij",
)
rays = torch.stack([uu, vv, torch.ones_like(uu)], dim=-1)
d_world = torch.nn.functional.normalize(
(rays @ torch.linalg.inv(K).T) @ c2w[:3, :3].T,
dim=-1,
eps=1e-9,
)
o = c2w[:3, 3].view(1, 1, 3)
m = torch.cross(o, d_world, dim=-1)
return torch.cat([d_world, m], dim=-1)
@inproceedings{jiang2026knowyourcamera,
title = {Do You Know Where Your Camera Is? {V}iew-Invariant Policy Learning with Camera Conditioning},
author = {Tianchong Jiang and Jingtian Ji and Xiangshan Tan and Jiading Fang and Anand Bhattad and Vitor Guizilini and Matthew R. Walter},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2026},
}