On-Camera People Detection

RealSense D500 Series – People Detection

Real-time people detection running entirely on the RealSense D555 camera SoC. Zero inference cost on the host. Dedicated ROS 2 DDS topic. 53 ms end-to-end latency. Available today on D555 and rolling out to the rest of the D500 series.

Key Metrics

53 ms

End-to-end latency measured at the host

0%

Host AI compute — everything runs on-camera

D500+

Live on D555 today; D500 series next

Overview & live demo

People detection inside the camera, inference runs on the D555 SoC, results stream out on a dedicated native ROS topic or RealSense’s SDK. The demo footage referenced below was recorded on a D555 under ambient indoor lighting, with people walking through the field of view and detection running continuously.

Why this matters now. People detection is one of the first on-device perception model in the Perception Studio catalog and the proof point that the D500 series can host meaningful real-time valuable AI Perception features.

 Why on-camera inference

Running detection on the camera changes three things customers care about: latency, bandwidth, and integration cost.

Host-side inference (the traditional approach)

  • Burns host CPU/GPU you need for additional AI applications
  • Adds host-side preprocessing and copy overhead
  • Bandwidth = full-resolution RGB at 30 fps
  • Latency = camera + bus + queueing + inference + return

On-camera inference (this feature)

  • Zero CPU/GPU cost on the host for inference
  • Per-frame output is a small JSON payload, not a video stream
  • 53 ms end-to-end latency, measured at the host
  • RGB stream can be subscribed independently, or not at all

For service robots, cobots, Humanoids and AMRs that already have a Jetson running a planning and navigation stack and other AI inferences, this is the difference between “we’d add people detection if we had host headroom” and “we have people detection today, on the camera.”

How it works — at a glance

A compact on-device AI model runs on the D555 SoC. Each frame produces a list of detections class, confidence, bounding box and distance – published independently of the RGB stream.

Model
A compact on-device detector tuned specifically for people in the current release. Inference runs entirely on the camera; nothing leaves the device until the result is published. Additional classes and custom-trained variants are on the roadmap.

Output cadence
Detections are published at 15 fps. When no people are present, an empty result message is still emitted so consumers always have a current frame to reason about — no ambiguity between “nothing detected” and “no message received.”

API at a glance

Detection results are published on a dedicated ROS 2 DDS topic or RealSense’s SDK. Subscribe to enable, unsubscribe to disable.

The interface

Field Value
Topic name /realsense/D555_{serial}_ObjectDetection
Message type std_msgs/msg/String (JSON payload)
QoS reliability BEST_EFFORT — matches image topics, minimises latency
Lifecycle Subscribe to enable; unsubscribe to disable
Frame rate 15 fps

Per-frame payload

The JSON message conforms to a versioned schema. The fields are deliberately small — everything you typically need to drive a downstream behavior, nothing more.

JSON
{
  "schema_version": 1,
  "message_type": "object_detection",
  "timestamp_us": 123456789,
  "frame_id": 42,
  "enabled": true,
  "has_result": true,
  "number_of_detections": 2,
  "detections": [
    { "class_id": 0, "confidence": 85, "x1": 100, "y1": 50,
      "x2": 300, "y2": 250, "distance_mm": 1500 },
    { "class_id": 1, "confidence": 72, "x1": 400, "y1": 100,
      "x2": 550, "y2": 300, "distance_mm": 2200 }
  ]
}

Each detection carries the class id (0 = person in this release), a confidence score 0–100, the bounding box in pixels, and the measured distance to the person in millimetres. The has_result flag distinguishes “model ran, nothing found” from “model didn’t run on this frame.”

Note on the distance field. In the current release, the distance value is calculated on the host — visible in RealSense Viewer, but not yet delivered through the API payload. The next release moves the distance calculation on-device so it’s available end-to-end through the DDS topic. Until then, treat distance_mm as a placeholder when consuming the topic programmatically.

Hardware & integration

On-camera detection runs today on the RealSense D555, with support extending to the rest of the D500 series. Host requirements stay minimal — the host does no inference work.

Component Requirement
Camera (today) RealSense D555 — preview live
Camera (next) Rolling out across the rest of the D500 series
Host OS Linux / Win
Host compute Any DDS-capable host, no GPU or AI accelerator needed
Network Standard RealSense PoE connectivity; detection payload is small JSON, negligible bandwidth
Other streams Color, Depth, IR streams continue independently; detection topic has its own lifecycle

Next steps

On-camera detection composes naturally with the rest of the Perception Studio stack.

  • Visual-Inertial Odometry (VIO) — pair localization with close-range manipulation depth on the same robot. — another Perception Studio feature you can compose with on-camera detection.
  • Improved Close Range Detection – reduced minimal detection range for up to 10cm on your existing RealSense cameras.
Top
Contact Sales
Server Region: North America | Hostname: ip-172-16-1-22 | Visitor Country: NO