Orbit – Diagnose why your robot policy fails and what data to collect next
1 points
1 hour ago
| 1 comment
| github.com
| HN
rahillasne
1 hour ago
[-]
Hi HN, I'm a college sophomore studying CS/Linguistics. I've been building a robot (Handybot) that cleans car interiors using learned policies, and kept running into the same problem: the policy works great in my training setup but fails in real deployments, and I had no way to figure out why. ORBIT is the tool I built to solve this. It logs deployment episodes, detects failures with heuristic detectors (gripper drops, stalls, timeouts), then does something I haven't seen elsewhere — it computes SigLIP vision embeddings for both deployment and training data, measures the distribution gap using FAISS, clusters failure modes with HDBSCAN, and generates ranked prescriptions for what specific demonstrations to collect next. The core insight: most deployment failures happen because the robot encounters visual conditions it wasn't trained on (different lighting, object positions, backgrounds). If you can measure that gap precisely, you can close it with targeted data collection instead of blindly collecting hundreds more demos. Built for HuggingFace's LeRobot framework. Live demo (no install): https://huggingface.co/spaces/Drahils/orbit-demo Limitations I'm upfront about: only works for camera-equipped manipulation arms, can't catch policy architecture bugs or timing issues, and the embedding gap analysis only surfaces visual distribution shifts. Probably covers ~40-50% of real deployment failures — but that's the chunk nobody has tooling for. Would love feedback, especially from anyone deploying learned robot policies in production.
reply