By Griffin Hurt <griffhurt@pitt.edu> with resources from the Surreality Lab
This project provides an easy way to capture rectified stereo videos with intrinsic and extrinsic camera data using the Apple Vision Pro. While the Vision Pro supports stereo video capture via spatial video, camera parameters are not saved, which makes projection into a point cloud difficult. This application exports camera parameters as a .json file to make reconstruction more straightforward (assuming a high-fidelity stereo matching pipeline is available).
- Make sure you have the newest Xcode 26 beta installed on your Mac and the visionOS 26 beta installed on your Vision Pro.
- Replace the
Enterprise.licensedummy file with your license. - Build the application and run it on the Apple Vision Pro.
- Start the camera preview using the button at the bottom of the screen (this starts the immersive space, which is necessary for main camera access)
- Start capturing video using the red button
- End your capture by pressing the same button (now with a stop icon)
- Captures will be saved to the application's folder in the "Files" application. Video files are saved with side-by-side format in the "Video Captures" folder and data files are stored in JSON in the "Video Data" folder.
Below is an example of the format for video data files:
{
"right_extrinsics": {
"m00": 1.0000001,
"m01": -4.034555e-10,
...
"m33": 1,
},
"left_intrinsics": {
"m00": 496.95917,
...
"m22": 1
},
"right_intrinsics": {
...
},
"left_extrinsics": {
...
}
}The Vision Pro cameras are displaced in the abs(data["left_extrinsics"]["m03"] - data["right_extrinsics"]["m03"]). Those seeking a more robust solution may choose to find the distance between the translation vectors of the cameras (4th column vector in the extrinsics).
We have been able to create the most accurate point clouds from Vision Pro captures using the FoundationStereo model from NVIDIA. Using an inference batch size of 4, frames take ~1 second to process on an RTX A5000. Whenever the script to generate point clouds from these captures is cleaner, I can add it to the repository.
