Visual-Inertial Odometery (VIO)

RealSense RGB-D Stereo Cameras + NVIDIA cuVSLAM · multiple modes, one stack

Production-grade VIO for humanoids, AMRs, and drones built on RealSense RGB-D stereo cameras and the GPU-accelerated NVIDIA cuVSLAM library. The same stack also delivers pure stereo VO, hardware-synced multi-camera and VIO. Available as a C++ runtime (recommended for production, wrapped by the Isaac ROS Visual SLAM node) with a Python API for prototyping.

Key Metrics

<1%

Trajectory drift on standard benchmarks

~0.5%

Translation error on typical multi-cam runs

~10%

CPU/GPU footprint on Jetson Orin NX

Overview & live demo

NVIDIA cuVSLAM is a CUDA-accelerated SLAM library tuned for real-time Visual Inertial Odometry (VIO) on Jetson. Paired with a RealSense RGB-D stereo camera, it delivers production-grade pose estimation. cuVSLAM also supports pure stereo VO, multi-camera, and VIO as the primary path.

In the video recorded at RealSense robotics lab you can see the VIO pose estimation vs opti-track ground truth system using RealSense D455 and Jetson Orin with NVIDIA cuVSLAM VIO demonstrating sub <1% pose drift.

Source. The commands and example scripts referenced here are mirrored from the official nvidia-isaac/cuVSLAM · examples/realsense/README.md  Bug reports against the underlying examples should go to that repository; this paper is a RealSense-native walkthrough.

In a controlled evaluation against motion-capture ground truth, cuVSLAM running on a single RealSense D455 stereo stream held absolute trajectory drift to just 6.8 cm (ATE RMSE) across the entire run, while frame-to-frame translation error stayed down at 5.8 mm (RPE-T RMSE)  centimeter-level accuracy globally, millimeter-level consistency locally. Scale was recovered to within 1.6% of true, and the system estimated a pose for every single frame: 12,453 of 12,453, with zero drops.

The takeaway: production-grade VSLAM robustness — accurate, stable, and uninterrupted — on commodity RealSense stereo hardware.

Access paths to cuVSLAM

cuVSLAM is delivered as one library with two access paths: the C++ runtime wrapped by Isaac ROS (the supported, recommended path for production) and a Python wrapper for prototyping. Choose before you start.

One library. Two access paths. Recommended path is C++ on ROS 2. cuVSLAM is a CUDA-accelerated visual-SLAM library written in C++. For real-world robots, NVIDIA recommends running it through the Isaac ROS Visual SLAM node — the supported, production-grade path. A Python wrapper, PyCuVSLAM, is also available for quick prototyping and benchmarking; this tutorial uses it for its sample commands because every mode is documented end-to-end as a Python example in the cuVSLAM repo.

RECOMMENDED • C++ + ROS 2

Production deployments. Real-time guarantees on Jetson. Integration with Nav2, Nvblox, Isaac Manipulator. The supported runtime for shipping products.

ros2 launch isaac_ros_visual_slam …

ALSO AVAILABLE • PyCuVSLAM

Python wrapper around the same library. Useful for prototyping, benchmarking, and demos. The four mode examples below show the Python invocations.

python3 run_vio.py

Camera streams, IMU calibration, and sync settings are identical between both paths — work you do in either transfers directly to the other.

Mode 1 — Stereo Visual Odometry

Pure stereo VO from one D455 or D436 — no IMU, no calibration tuning. The fastest health-check that the stack runs end-to-end on your robot. Trajectory renders live in rerun.

01

Stereo VO

One D455 / D436 · stereo only · no IMU

recommended starting point

Use this mode when you want to confirm the pipeline is wired up correctly, or when your application doesn’t need inertial fusion. It’s the entry point for everything else — once stereo VO is producing a clean trajectory, you can layer on IMU, multi-camera, or RGBD.

Figure 1 — rerun visualization, stereo VO. Source: nvidia-isaac/cuVSLAM.

Mode 2 — Stereo Inertial Odometry (VIO)

Stereo + IMU fusion. The headline mode and the one most humanoid, AMR, and drone applications converge on. Sub-1% drift, drift-free over the long run.

02

Stereo VIO

One D455 or D436 · stereo + on-camera IMU

primary path for production robotics

VIO is the mode this paper is anchored to. Accuracy is improved thanks to the IMU.

Figure 2 — rerun visualization, stereo VIO with gravity vector. Source: nvidia-isaac/cuVSLAM.

Production VIO on Jetson. The README flags real-time VIO via the Python API on Jetson as unreliable. For shipped products use the C++ Isaac ROS Visual SLAM node — same camera, same IMU, same calibration, supported runtime.

Mode 3 — Multi-camera Odometry

Two or three hardware-synced cameras feeding cuVSLAM jointly. Robust through occlusion, low-texture scenes, and aggressive motion. This is the mode that survives bipedal head rotation.

03

Multi-camera

2–3 hardware-synced D455 / D436 · YAML-defined extrinsics

production-grade robustness

Multi-camera is the upgrade path when single-camera VIO doesn’t survive your motion profile. It also unlocks better coverage during turns and provides redundancy if one camera loses features.

Figure 3 — multi-camera odometry, three-camera rig. Source: nvidia-isaac/cuVSLAM.

References

cuVSLAM is the front-end. The full perception stack composes it with mapping, occupancy, and navigation packages.

You did it. You now have a verified PyCuVSLAM pipeline on RealSense across four modes. Found a discrepancy with the NVIDIA repo? File an issue at nvidia-isaac/cuVSLAM/issues or ping the Perception Studio Discord.

Top
Contact Sales
Server Region: North America | Hostname: ip-172-16-1-22 | Visitor Country: IT