We Give Robots
Vision
The open-source vision framework for edge devices. Runs DeepStream pipelines, YOLO detection, and world models at 60 FPS on NVIDIA Jetson, Intel NPU, and Hailo.
Core Architecture
DeepStream Pipeline
Hardware-accelerated processing via GStreamer/DeepStream. Runs TensorRT YOLO engines directly on GPU, passing frames via appsink for zero-copy Python manipulation.
Multi-Modal Inference
Simultaneous FaceMesh (up to 3 faces), Hand tracking (8 defined gestures), and full-body Pose estimation running alongside primary object detection.
ROS2 Native
Publishes telemetry across 10 specialized topics (/vision/detections, /vision/depth, /vision/pose) using MultiThreadedExecutor.
World Models
Predictive intelligence at 200 Hz with LeWM (15M params) and V-JEPA 2 for spatiotemporal awareness.
Performance Benchmarks
Tested on NVIDIA Jetson Orin Nano (8GB) in MAXN mode.
| Configuration | Models Active | Frame Rate | Latency |
|---|---|---|---|
| Detection Only (INT8) | YOLOv10n TensorRT | 60 FPS | 16ms |
| Minimal Pipeline | Detection + Depth + Tracking | 35-40 FPS | 28ms |
| Full Pipeline (v3.0.1) | Detection + Face + Gesture + Pose | 25-30 FPS | 38ms |
| World Model Planning | LeWM 15M (Inference Only) | 200 Hz | 5ms |