# LLM Command Node
Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama) with dashboard control.
## Quick Start
1. **Copy `.env.example` to `.env`** and add your API key:
```bash
cp .env.example .env
```
2. **Start development mode:**
```bash
mecha10 dev
```
3. **Open the dashboard** at `http://localhost:3000/dashboard/robot-control`
4. **Send commands** via the AI Command Control panel!
## Overview
The LLM Command node allows users to control robots using natural language commands. It leverages large language models (LLMs) to parse commands and convert them into structured actions.
## Features
- **Multi-Provider Support**: OpenAI, Claude (Anthropic), and local Ollama
- **Command Parsing**: Converts natural language into structured robot actions
- **Action Routing**: Publishes to appropriate topics (motor commands, navigation goals, behaviors)
- **Vision Queries**: Uses object detection data to answer "what do you see?" questions
- **Behavior Interruption**: Automatically pauses autonomous behaviors when user commands are issued
- **Auto-Resume**: Configurable automatic resumption of behaviors after timeout
- **Dashboard Integration**: Real-time command input and response display
- **Error Handling**: Clear error messages and timeout handling
## Configuration
The node is configured via `configs/*/llm-command.toml` (or through `mecha10.json`):
```toml
# LLM Provider Configuration
provider = "openai" # Options: "openai", "claude", "local"
model = "gpt-4o-mini"
temperature = 0.7
max_tokens = 500
vision_enabled = false
# Topic Configuration
[topics]
command_in = "/ai/command"
response_out = "/ai/response"
camera_in = "/robot/sensors/camera/rgb"
nav_goal_out = "/nav/goal"
motor_cmd_out = "/motor/cmd_vel"
behavior_out = "/behavior/execute"
# Behavior Interrupt Configuration
[behavior_interrupt]
enabled = true
mode = "interrupt_with_auto_resume" # Options: "disabled", "interrupt_only", "interrupt_with_auto_resume"
timeout_secs = 30 # Auto-resume timeout (for interrupt_with_auto_resume mode)
await_completion = false
control_topic = "/behavior/control"
```
### Behavior Interrupt Configuration
When the LLM issues motor or navigation commands, it can automatically interrupt autonomous behaviors:
- **`enabled`**: Enable/disable behavior interruption (default: `true`)
- **`mode`**: Interrupt behavior (options below):
- `"disabled"`: Never interrupt behavior tree
- `"interrupt_only"`: Interrupt but don't auto-resume (manual resume required)
- `"interrupt_with_auto_resume"`: Interrupt and automatically resume after timeout
- **`timeout_secs`**: Seconds before auto-resume (default: `30`)
- **`await_completion`**: Wait for command completion before resuming (not yet implemented)
- **`control_topic`**: Topic for behavior control commands (default: `"/behavior/control"`)
### Environment Variables
**Recommended: Use `.env` file in your project root**
Copy `.env.example` to `.env` and add your API keys:
```bash
# Copy the example file
cp .env.example .env
# Edit .env and add your API key
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```
The `.env` file is automatically loaded by `mecha10 dev` and passed to all nodes.
**Alternative: Set environment variables directly**
```bash
# For OpenAI
export OPENAI_API_KEY="sk-..."
# For Claude
export ANTHROPIC_API_KEY="sk-ant-..."
# For local Ollama (no key needed)
# Ensure Ollama is running on localhost:11434
```
## Topics
### Input Topics
- **`/ai/command`** (`CommandMessage`): Natural language command from user
```json
{
"text": "move forward",
"timestamp": 1234567890,
"user_id": "optional_user_id"
}
```
### Output Topics
- **`/ai/response`** (`ResponseMessage`): LLM response with action feedback
```json
{
"text": "Moving the robot forward",
"timestamp": 1234567890,
"action_taken": true,
"error": null
}
```
- **`/motor/cmd_vel`** (`MotorCommand`): Motor velocity commands
```json
{
"linear": 0.5,
"angular": 0.0,
"timestamp": 1234567890
}
```
- **`/nav/goal`** (`NavigationGoal`): Navigation waypoint goals
```json
{
"x": 5.0,
"y": 3.0,
"theta": 0.0,
"timestamp": 1234567890
}
```
- **`/behavior/execute`** (`BehaviorCommand`): Behavior execution commands
```json
{
"name": "follow_person",
"params": null,
"timestamp": 1234567890
}
```
## Command Examples
### Motor Commands
- `"move forward"` → `{"action": "motor", "linear": 0.5, "angular": 0.0}`
- `"turn left"` → `{"action": "motor", "linear": 0.0, "angular": 0.5}`
- `"stop"` → `{"action": "motor", "linear": 0.0, "angular": 0.0}`
### Navigation Commands
- `"go to x:5 y:3"` → `{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}`
- `"move to the door"` → Extracts coordinates and navigates
### Behavior Commands
- `"follow that person"` → `{"action": "behavior", "name": "follow_person"}`
- `"patrol the area"` → `{"action": "behavior", "name": "patrol"}`
### Vision Queries
The node subscribes to `/vision/detections` from the object-detector node and uses this data to answer vision questions:
- `"what do you see?"` → "I see a person (95% confidence) and a car (87% confidence)"
- `"is there a person in front of me?"` → "Yes, I detect 1 person with 95% confidence"
- `"how many cars?"` → "I see 2 cars: car (87% confidence) and car (82% confidence)"
- `"describe what's visible"` → Natural language description based on detections
**How it works:**
1. Object detector node continuously publishes detections to `/vision/detections`
2. LLM command node stores the latest detections
3. When a vision query is detected, detections are formatted as text context
4. LLM analyzes the detections and provides a natural language response
**Benefits over vision APIs:**
- ✅ **Much cheaper** - No image tokens, just structured detection data
- ✅ **Faster** - No need to encode/send images
- ✅ **More accurate** - Uses specialized YOLO model for detection
## Behavior Interruption
The LLM command node intelligently manages the interaction between user commands and autonomous behaviors:
### How It Works
1. **Automatic Interruption**: When the LLM parses a motor or navigation command, it interrupts the behavior tree
2. **User Priority**: Direct user commands always take priority over autonomous behaviors
3. **Auto-Resume**: After a timeout (configurable), the behavior tree automatically resumes
4. **Manual Resume**: Users can manually re-enable behaviors via the dashboard
### Interrupt Modes
**Disabled** (`mode = "disabled"`)
- Behavior tree is never interrupted by LLM commands
- User commands may be overridden by autonomous behaviors
- Use when you want autonomous behaviors to have priority
**Interrupt Only** (`mode = "interrupt_only"`)
- Behavior tree is paused when motor/navigation commands are issued
- **No automatic resumption** - requires manual re-enable from dashboard
- Use when you want explicit control over behavior resumption
**Interrupt with Auto-Resume** (`mode = "interrupt_with_auto_resume"`)
- Behavior tree is paused when motor/navigation commands are issued
- **Automatically resumes** after `timeout_secs` (default: 30s)
- Use for seamless switching between manual and autonomous control
### Example Scenario
```
1. Robot is running "patrol" behavior (autonomous)
2. User says: "stop" via LLM command
→ Behavior tree is interrupted
→ Motor command published: {linear: 0.0, angular: 0.0}
3. Robot stops and remains idle
4. After 30 seconds (timeout):
→ Behavior tree automatically resumes
→ Robot continues patrolling
```
### Control Messages
The system uses enhanced `BehaviorControl` messages:
```json
{
"action": "interrupt",
"source": "llm-command",
"duration_secs": 30,
"timestamp": 1234567890
}
```
Actions:
- **`interrupt`**: Pause behavior tree (from LLM command)
- **`resume`**: Resume behavior tree (manual or auto)
- **`enable`**: Enable behavior tree (from dashboard)
- **`disable`**: Disable behavior tree (from dashboard)
## System Prompt
The default system prompt guides the LLM to parse commands into structured JSON actions:
```
You are a helpful robot assistant. Parse user commands and respond with structured actions.
For navigation commands (e.g., "go to the door", "move to coordinates"), extract the goal and respond with JSON:
{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}
For motor commands (e.g., "move forward", "turn left", "stop"), respond with JSON:
{"action": "motor", "linear": 0.5, "angular": 0.0}
For behavior commands (e.g., "follow that person", "patrol the area"), respond with JSON:
{"action": "behavior", "name": "follow_person"}
For vision queries (e.g., "what do you see?"), describe what's visible in the camera feed.
For general questions, respond conversationally.
```
You can customize this prompt in the configuration.
## Dashboard Integration
The dashboard provides a user-friendly interface for:
1. **Command Input**: Text field for natural language commands
2. **Command History**: Shows past commands with status indicators
3. **Response Display**: Shows LLM responses and action feedback
4. **Status Badges**: Connection status and processing indicators
Access the dashboard at `http://localhost:3000/dashboard/robot-control`
## Architecture
```
┌─────────────────┐
│ Dashboard UI │
│ (Command Input) │
└────────┬────────┘
│ publishes
▼
/ai/command
│
▼
┌──────────────────┐
│ OpenAI Reasoning │
│ Node │
│ │
│ ┌────────────┐ │
│ │ LlmNode │ │
│ │ (mecha10- │ │
│ │ ai-llm) │ │
│ └────────────┘ │
│ │ │
│ Parse JSON │
│ │ │
└─────────┼────────┘
│
┌─────┴─────┬──────────────┬───────────────┐
▼ ▼ ▼ ▼
/motor/cmd_vel /nav/goal /behavior/execute /ai/response
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────┐
│ Motor │ │Navigation│ │ Behavior │ │Dashboard│
│ Driver │ │ Stack │ │ Executor │ │ UI │
└────────┘ └──────────┘ └─────────────┘ └─────────┘
```
## Dependencies
- **mecha10-core**: Framework core (Context, Topic, Message)
- **mecha10-ai-llm**: LLM integration library (providers, LlmNode)
- **tokio**: Async runtime
- **serde/serde_json**: Serialization
- **anyhow**: Error handling
- **reqwest**: HTTP client (for API calls)
## Running
The node is launched automatically by `mecha10 dev` when included in `mecha10.json`.
To run manually:
```bash
cargo run -p mecha10-nodes-llm-command
```
## Testing
Test the node with simulation:
1. Start control plane and simulation:
```bash
docker compose up -d
mecha10 dev
```
2. Send a test command via dashboard or Redis CLI:
```bash
redis-cli PUBLISH "/ai/command" '{"text":"move forward","timestamp":1234567890}'
```
3. Subscribe to response topic:
```bash
redis-cli SUBSCRIBE "/ai/response"
redis-cli SUBSCRIBE "/motor/cmd_vel"
```
## Limitations
- **Vision queries not yet supported**: Camera frame integration pending
- **No conversation context**: Each command is processed independently
- **API rate limits**: Subject to provider rate limits (OpenAI, Claude)
- **Network latency**: Response time depends on LLM API latency
## Future Enhancements
- [ ] Vision query support (integrate camera feed)
- [ ] Conversation context (multi-turn dialogue)
- [ ] Voice input integration
- [ ] Command validation and safety checks
- [ ] Multi-language support
- [ ] Offline fallback mode
## License
MIT