01Pipeline
Seven stages, end-to-end, in under a second.
| # | Stage | Role | Lib |
|---|---|---|---|
| 01 | mic | USB → 16 kHz mono, VAD trims silence | sounddevice |
| 02 | whisper | On-device STT (base.en), sub-second | openai-whisper |
| 03 | intent | Fast parser, LLM fallback → typed atomic Plan | qwen · pydantic |
| 04 | camera | Overhead + oblique side cam, 1920×1080 | opencv-python |
| 05 | sam3 | Remote masks + centroids | socket.io |
| 06 | homography | 4 anchors → robot base frame | numpy · cv2 |
| 07 | urscript | Atomic action → URScript over TCP:30002 | ur_rtde |
02Atomic actions
Three tiers. The LLM lives in tier 1 most of the time.
| Primitive | What it does |
|---|---|
| scan_scene() | Capture frame, run SAM3, refresh scene state. |
| pick(obj, modifier?) | Move above → descend → grasp → ascend. Modifier disambiguates ("leftmost"). |
| place_at(zone) | Release at a named zone (config or runtime). |
| place_on(support) | Stack on top, using the support's cached height. |
| place_at_original_of(obj) | Return to the XY where the named object was picked. |
| assign_zone(name, where) | Declare a temporary zone — free_spot, corner. |
| measure_height(obj) | Pure sonar measurement, cached. |
| measure_all_heights(set) | Eager bulk measure before motion. |
| home() | Joint-space move to safe pose. |
Tier 2 (rare): raw motion — move_above, descend_to, ascend_to, grip, release, orient_to, gripper_set. Used for calibration and error recovery only.
Tier 3 (meta): dry_run(), abort(), await_user(prompt). Plan validation, e-stop, operator-in-the-loop.
The LLM never produces coordinates and never produces motor commands. It names objects, zones and intentions; the runtime resolves geometry, measures depth and drives motors.
03Schema
Every plan is validated before any motion: a Pydantic discriminated union over the atomic action types, followed by a cross-plan validator that rejects inconsistent grip state, missing object heights, unknown objects or zones and over-long plans. Invalid plans come back to the LLM as structured errors it corrects on the next turn.
04Runtime state
One state object, mutated by the executor. The LLM never sees it directly.
| Field | Role |
|---|---|
| held_object | What's in the gripper now. |
| scene | Last scan, keyed by stable IDs. |
| height_cache | Sonar-measured heights, per session. |
| zones | Config DROP_ZONES + runtime-declared. |
| original_xy | Snapshot at pick() — makes swap trivial. |
| plan_log | JSONL trace, one line per executed action. |
05Profiles
Same atomic schema, different end-effector executor. Built around the UR5e; the same plans run on the lab's other arms unchanged.
| Profile | Arm | End-effector | Payload | Reach | Role |
|---|---|---|---|---|---|
| ur5e_vacuum | UR5e | Robotiq EPick vacuum | 5 kg | 850 mm | primary · demo |
| ur3e_2finger | UR3e | Robotiq 2-finger | 3 kg | 500 mm | also runs |
| ur10e_3finger | UR10e | Robotiq 3-finger | 12.5 kg | 1300 mm | also runs |
Profiles live in robopick/profiles.py. Activating one overrides config — robot IP, end-effector, workspace bounds, homography — without touching another profile's calibration.