Key Metrics
<1%
Trajectory drift on standard benchmarks
~0.5%
Translation error on typical multi-cam runs
~10%
CPU/GPU footprint on Jetson Orin NX
Overview & live demo
NVIDIA cuVSLAM is a CUDA-accelerated SLAM library tuned for real-time Visual Inertial Odometry (VIO) on Jetson. Paired with a RealSense RGB-D stereo camera, it delivers production-grade pose estimation. cuVSLAM also supports pure stereo VO, multi-camera, and VIO as the primary path.
In the video recorded at RealSense robotics lab you can see the VIO pose estimation vs opti-track ground truth system using RealSense D455 and Jetson Orin with NVIDIA cuVSLAM VIO demonstrating sub <1% pose drift.
Source. The commands and example scripts referenced here are mirrored from the official nvidia-isaac/cuVSLAM · examples/realsense/README.md Bug reports against the underlying examples should go to that repository; this paper is a RealSense-native walkthrough.
In a controlled evaluation against motion-capture ground truth, cuVSLAM running on a single RealSense D455 stereo stream held absolute trajectory drift to just 6.8 cm (ATE RMSE) across the entire run, while frame-to-frame translation error stayed down at 5.8 mm (RPE-T RMSE) centimeter-level accuracy globally, millimeter-level consistency locally. Scale was recovered to within 1.6% of true, and the system estimated a pose for every single frame: 12,453 of 12,453, with zero drops.
The takeaway: production-grade VSLAM robustness — accurate, stable, and uninterrupted — on commodity RealSense stereo hardware.
Access paths to cuVSLAM
cuVSLAM is delivered as one library with two access paths: the C++ runtime wrapped by Isaac ROS (the supported, recommended path for production) and a Python wrapper for prototyping. Choose before you start.
One library. Two access paths. Recommended path is C++ on ROS 2. cuVSLAM is a CUDA-accelerated visual-SLAM library written in C++. For real-world robots, NVIDIA recommends running it through the Isaac ROS Visual SLAM node — the supported, production-grade path. A Python wrapper, PyCuVSLAM, is also available for quick prototyping and benchmarking; this tutorial uses it for its sample commands because every mode is documented end-to-end as a Python example in the cuVSLAM repo.
RECOMMENDED • C++ + ROS 2
Production deployments. Real-time guarantees on Jetson. Integration with Nav2, Nvblox, Isaac Manipulator. The supported runtime for shipping products.
ros2 launch isaac_ros_visual_slam …
ALSO AVAILABLE • PyCuVSLAM
Python wrapper around the same library. Useful for prototyping, benchmarking, and demos. The four mode examples below show the Python invocations.
python3 run_vio.py
Camera streams, IMU calibration, and sync settings are identical between both paths — work you do in either transfers directly to the other.
Mode 1 — Stereo Visual Odometry
Pure stereo VO from one D455 or D436 — no IMU, no calibration tuning. The fastest health-check that the stack runs end-to-end on your robot. Trajectory renders live in rerun.
Stereo VO
One D455 / D436 · stereo only · no IMU
Use this mode when you want to confirm the pipeline is wired up correctly, or when your application doesn’t need inertial fusion. It’s the entry point for everything else — once stereo VO is producing a clean trajectory, you can layer on IMU, multi-camera, or RGBD.
Figure 1 — rerun visualization, stereo VO. Source: nvidia-isaac/cuVSLAM.
Mode 2 — Stereo Inertial Odometry (VIO)
Stereo + IMU fusion. The headline mode and the one most humanoid, AMR, and drone applications converge on. Sub-1% drift, drift-free over the long run.
Stereo VIO
One D455 or D436 · stereo + on-camera IMU
VIO is the mode this paper is anchored to. Accuracy is improved thanks to the IMU.
Figure 2 — rerun visualization, stereo VIO with gravity vector. Source: nvidia-isaac/cuVSLAM.
Production VIO on Jetson. The README flags real-time VIO via the Python API on Jetson as unreliable. For shipped products use the C++ Isaac ROS Visual SLAM node — same camera, same IMU, same calibration, supported runtime.
Mode 3 — Multi-camera Odometry
Two or three hardware-synced cameras feeding cuVSLAM jointly. Robust through occlusion, low-texture scenes, and aggressive motion. This is the mode that survives bipedal head rotation.
Multi-camera
2–3 hardware-synced D455 / D436 · YAML-defined extrinsics
Multi-camera is the upgrade path when single-camera VIO doesn’t survive your motion profile. It also unlocks better coverage during turns and provides redundancy if one camera loses features.
Figure 3 — multi-camera odometry, three-camera rig. Source: nvidia-isaac/cuVSLAM.
References
cuVSLAM is the front-end. The full perception stack composes it with mapping, occupancy, and navigation packages.
- Primary technical reference — official nvidia-isaac/cuVSLAM RealSense examples README
- Isaac ROS Visual SLAM RealSense tutorial — (C++ ROS 2 production path)
- Perception Studio team page — for partnership questions, design-partner enquiries, and integration support.
You did it. You now have a verified PyCuVSLAM pipeline on RealSense across four modes. Found a discrepancy with the NVIDIA repo? File an issue at nvidia-isaac/cuVSLAM/issues or ping the Perception Studio Discord.

