Key Metrics
53 ms
End-to-end latency measured at the host
0%
Host AI compute — everything runs on-camera
D500+
Live on D555 today; D500 series next
Overview & live demo
People detection inside the camera, inference runs on the D555 SoC, results stream out on a dedicated native ROS topic or RealSense’s SDK. The demo footage referenced below was recorded on a D555 under ambient indoor lighting, with people walking through the field of view and detection running continuously.
Why this matters now. People detection is one of the first on-device perception model in the Perception Studio catalog and the proof point that the D500 series can host meaningful real-time valuable AI Perception features.
Why on-camera inference
Running detection on the camera changes three things customers care about: latency, bandwidth, and integration cost.
Host-side inference (the traditional approach)
- Burns host CPU/GPU you need for additional AI applications
- Adds host-side preprocessing and copy overhead
- Bandwidth = full-resolution RGB at 30 fps
- Latency = camera + bus + queueing + inference + return
On-camera inference (this feature)
- Zero CPU/GPU cost on the host for inference
- Per-frame output is a small JSON payload, not a video stream
- 53 ms end-to-end latency, measured at the host
- RGB stream can be subscribed independently, or not at all
For service robots, cobots, Humanoids and AMRs that already have a Jetson running a planning and navigation stack and other AI inferences, this is the difference between “we’d add people detection if we had host headroom” and “we have people detection today, on the camera.”
How it works — at a glance
A compact on-device AI model runs on the D555 SoC. Each frame produces a list of detections class, confidence, bounding box and distance – published independently of the RGB stream.
Model
A compact on-device detector tuned specifically for people in the current release. Inference runs entirely on the camera; nothing leaves the device until the result is published. Additional classes and custom-trained variants are on the roadmap.
Output cadence
Detections are published at 15 fps. When no people are present, an empty result message is still emitted so consumers always have a current frame to reason about — no ambiguity between “nothing detected” and “no message received.”
API at a glance
Detection results are published on a dedicated ROS 2 DDS topic or RealSense’s SDK. Subscribe to enable, unsubscribe to disable.
The interface
| Field | Value |
|---|---|
| Topic name | /realsense/D555_{serial}_ObjectDetection |
| Message type | std_msgs/msg/String (JSON payload) |
| QoS reliability | BEST_EFFORT — matches image topics, minimises latency |
| Lifecycle | Subscribe to enable; unsubscribe to disable |
| Frame rate | 15 fps |
Per-frame payload
The JSON message conforms to a versioned schema. The fields are deliberately small — everything you typically need to drive a downstream behavior, nothing more.
{
"schema_version": 1,
"message_type": "object_detection",
"timestamp_us": 123456789,
"frame_id": 42,
"enabled": true,
"has_result": true,
"number_of_detections": 2,
"detections": [
{ "class_id": 0, "confidence": 85, "x1": 100, "y1": 50,
"x2": 300, "y2": 250, "distance_mm": 1500 },
{ "class_id": 1, "confidence": 72, "x1": 400, "y1": 100,
"x2": 550, "y2": 300, "distance_mm": 2200 }
]
}
Each detection carries the class id (0 = person in this release), a confidence score 0–100, the bounding box in pixels, and the measured distance to the person in millimetres. The has_result flag distinguishes “model ran, nothing found” from “model didn’t run on this frame.”
Note on the distance field. In the current release, the distance value is calculated on the host — visible in RealSense Viewer, but not yet delivered through the API payload. The next release moves the distance calculation on-device so it’s available end-to-end through the DDS topic. Until then, treat distance_mm as a placeholder when consuming the topic programmatically.
Hardware & integration
On-camera detection runs today on the RealSense D555, with support extending to the rest of the D500 series. Host requirements stay minimal — the host does no inference work.
| Component | Requirement |
|---|---|
| Camera (today) | RealSense D555 — preview live |
| Camera (next) | Rolling out across the rest of the D500 series |
| Host OS | Linux / Win |
| Host compute | Any DDS-capable host, no GPU or AI accelerator needed |
| Network | Standard RealSense PoE connectivity; detection payload is small JSON, negligible bandwidth |
| Other streams | Color, Depth, IR streams continue independently; detection topic has its own lifecycle |
Next steps
On-camera detection composes naturally with the rest of the Perception Studio stack.
- Visual-Inertial Odometry (VIO) — pair localization with close-range manipulation depth on the same robot. — another Perception Studio feature you can compose with on-camera detection.
- Improved Close Range Detection – reduced minimal detection range for up to 10cm on your existing RealSense cameras.

