Autonomy Labs · Rover Software Stack v0.2

Autonomy Labs · Rover Program

Runtime Architecture · v0.2

Three-tier runtime
Cognition → Planning → Reflex
Public blueprint

DocARCH-01 StatusPublic · v0.2 AuthorR. Kumar Updated2026-05-23 Peoria, IL

v0.2 · what changed

MPPI is now a first-class layer between cognition and reflex — not buried as an implementation detail inside the planner. That's the seam where the productivity-vs-safety tradeoff actually gets adjudicated. VLM → VLA: the cognition layer now emits actions/subgoals, not just descriptions — aligned with the current frontier (RT-2, OpenVLA, Pi-Zero). Semantic map and behavioral rules are shown explicitly as cost-function inputs to MPPI — the architectural seam where the differentiation from the training walk pays off. The v0.1 four-tier framing (L1–L4 + reflection) collapses to a cleaner three runtime tiers + a separate offline learning loop on its own temporal layer.

Why now

Three convergences make this the moment

We publish this stack openly because the moat is not the architecture — it is the execution. The pieces required to build a vision-first autonomous rover have arrived in the same eighteen months, and the team that integrates them first wins on velocity, not secrecy.

01 · Edge VLAs Vision-language-action models — RT-2, OpenVLA, Pi-Zero, Gemini Robotics-ER — reason spatially and emit subgoals in real time on a $249 Jetson.

02 · Commodity Perception Four $30 USB cameras now deliver the visual fidelity a VLA needs. One LiDAR puck costs more than the entire perception stack.

03 · AI-Paired Development A small team paired with capable coding agents can iterate the full stack daily — closing the loop between field run, log review, and next-day improvement.

System diagram · runtime

Runtime stack · cognition → planning → reflex · semantic map + behavioral rules as cost-function inputs

Tier-by-tier

COMMAND / DATA — flows down

ESCALATION / FAULT — flows up

REFLECTION — offline cycle

Cognition · VLAphotons → semantic subgoal

1–5Hz · event-driven

Substrate

Edge VLA on Jetson Orin · vision-language-action class (RT-2, OpenVLA, Gemini Robotics-ER)

VLAmulti-viewsubgoal-emit

Primary Loop

Scene comprehension · task decomposition · subgoal selection · success/failure detection · operator dialogue. Outputs a semantic subgoal (where, why, with what constraint) — never raw actuator commands.

▼

Cognition → Planning · contract subgoal(target_region, constraints, success_test)

▲

Operator → Cognition voice / glasses narration / button

Planning · MPPIsample, simulate, score

10–20Hz

Substrate

SBC (Jetson Orin or NUC) · sampling-based MPC

MPPIcost-fnN-rollout

Primary Loop

Sample N candidate trajectories · simulate each forward · score against a cost function that combines progress (productivity), geometric cost (semantic-map constraints), and rule violation (behavioral rules). Emit weighted-best trajectory as motor commands.

Cost input · ←

Semantic Map

Yard knowledge · built during training walk

Named regions, boundary polygons, categorical hazards, distance buffers. Compiled down from the semantic memory at training time. Read every planning tick to evaluate trajectory cost. See TWP-01 for how this gets populated.

Cost input · →

Behavioral Rules

Learned overnight · plain English

Causal rules with their full explanation chain ("don't get within five feet of ravines — depth + irrecoverability = damage"). Stored in language. Read by MPPI as rule-violation penalty terms in the cost function. Updated by the offline reflection cycle below.

▼

Planning → Reflex · contract motor_cmd(left_pwm, right_pwm, brake)

▲

Planning → Cognition · escalations stuck · all-trajectories-infeasible · subgoal-unreachable

Reflex · MCUsafety floor · spinal cord

100+Hz · hard real-time

Substrate

Dedicated safety MCU (Arduino-class) · deterministic loop, no OS

bumpercliff-IRultrasonicIMU-tilte-stop

Primary Loop

Poll sensors at > 100 Hz · check thresholds · clamp motor PWM on any safety interlock fire. Never negotiates. A bumper trip or cliff-IR fire immediately zeroes the actuators regardless of what the planner just commanded.

▼

Reflex → motors · contract PWM (with override authority)

▲

Reflex → ALL · interrupt E-STOP · halts entire stack

Meta · offline temporal layer

Offline · T+24h · learning loop

Daily Reflection Cycle

Where the rover gets smarter — runs on a different temporal layer than the diagram above

Logsthree-tier telemetry

→

LLM Reflectionnatural-language reasoning

→

New Rulesinjected to behavioral ruleset

Operational logs from cognition, planning, and reflex are synthesized overnight by the on-board LLM. The model reflects on what worked, what failed, and what surprised it — and emits new behavioral rules in natural language that become inputs to the MPPI cost function on the next operating day. No model retraining. No gradient updates. Cross-session learning via reasoning alone. This is the central thesis Autonomy Labs is built around — and the one we most want the field to push back on. A companion diagram covering this temporal layer (plus the training walk feedback into the semantic map) is forthcoming. The training walk itself is detailed in TWP-01.

Validation plan in progress. Before any on-yard trials, the cognition tier gets staged validation: an on-device latency + decision-quality baseline on the Jetson Orin (Qwen2.5-VL class model, real yard photos), then closed-loop software-in-the-loop in NVIDIA Isaac Sim with domain randomization, paired with a real-image holdout to guard against the sim-to-real appearance gap. Companion deep-dive page (VLDN-01 · "proving the stack") forthcoming once the on-device baseline is in.

Trace · Kitchen Errand

Worked example

"Go to the kitchen, pick up the plate, return."

A single mission decomposed across the three runtime tiers · trash bin appears mid-hallway

Operator

Voice command at the loading station.

Cognition

Decompose mission → [traverse hallway → enter kitchen → wait for button-press → return via reverse path]. Emit first subgoal to Planning: target_region=kitchen_entry, constraints=[avoid_furniture].

Planning

Sample N candidate trajectories toward kitchen. Score each against progress + semantic-map (hallway corridor polygon) + rules ("yield to people"). Mid-traverse: a new obstacle (trash bin) appears at 2.3 m → instantly reflected in the cost. Next tick, sampled trajectories that pass within the bin's inflation radius score badly; a 5° offset spline wins. Rejoin to subgoal line at 3.1 m.

Reflex

Continuously polling. Nothing triggered. Quietly heroic.

Cognition

Multi-view success detection on "plate placed on back" → button confirms → emit return mission.

Reflection

Overnight: "Trash bin observed in hallway at 2.3 m on three consecutive runs. Propose rule: route around bin location by default until visually confirmed clear."