mecha10-nodes-llm-command 0.1.39

Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama)
Documentation
# LLM Command Node

Natural language command parsing via LLM APIs (OpenAI, Claude, Ollama) with dashboard control.

## Quick Start

1. **Copy `.env.example` to `.env`** and add your API key:
   ```bash
   cp .env.example .env
   # Edit .env and set: OPENAI_API_KEY=sk-...
   ```

2. **Start development mode:**
   ```bash
   mecha10 dev
   ```

3. **Open the dashboard** at `http://localhost:3000/dashboard/robot-control`

4. **Send commands** via the AI Command Control panel!

## Overview

The LLM Command node allows users to control robots using natural language commands. It leverages large language models (LLMs) to parse commands and convert them into structured actions.

## Features

- **Multi-Provider Support**: OpenAI, Claude (Anthropic), and local Ollama
- **Command Parsing**: Converts natural language into structured robot actions
- **Action Routing**: Publishes to appropriate topics (motor commands, navigation goals, behaviors)
- **Vision Queries**: Uses object detection data to answer "what do you see?" questions
- **Behavior Interruption**: Automatically pauses autonomous behaviors when user commands are issued
- **Auto-Resume**: Configurable automatic resumption of behaviors after timeout
- **Dashboard Integration**: Real-time command input and response display
- **Error Handling**: Clear error messages and timeout handling

## Configuration

The node is configured via `configs/*/llm-command.toml` (or through `mecha10.json`):

```toml
# LLM Provider Configuration
provider = "openai"  # Options: "openai", "claude", "local"
model = "gpt-4o-mini"
temperature = 0.7
max_tokens = 500
vision_enabled = false

# Topic Configuration
[topics]
command_in = "/ai/command"
response_out = "/ai/response"
camera_in = "/robot/sensors/camera/rgb"
nav_goal_out = "/nav/goal"
motor_cmd_out = "/motor/cmd_vel"
behavior_out = "/behavior/execute"

# Behavior Interrupt Configuration
[behavior_interrupt]
enabled = true
mode = "interrupt_with_auto_resume"  # Options: "disabled", "interrupt_only", "interrupt_with_auto_resume"
timeout_secs = 30  # Auto-resume timeout (for interrupt_with_auto_resume mode)
await_completion = false
control_topic = "/behavior/control"
```

### Behavior Interrupt Configuration

When the LLM issues motor or navigation commands, it can automatically interrupt autonomous behaviors:

- **`enabled`**: Enable/disable behavior interruption (default: `true`)
- **`mode`**: Interrupt behavior (options below):
  - `"disabled"`: Never interrupt behavior tree
  - `"interrupt_only"`: Interrupt but don't auto-resume (manual resume required)
  - `"interrupt_with_auto_resume"`: Interrupt and automatically resume after timeout
- **`timeout_secs`**: Seconds before auto-resume (default: `30`)
- **`await_completion`**: Wait for command completion before resuming (not yet implemented)
- **`control_topic`**: Topic for behavior control commands (default: `"/behavior/control"`)

### Environment Variables

**Recommended: Use `.env` file in your project root**

Copy `.env.example` to `.env` and add your API keys:

```bash
# Copy the example file
cp .env.example .env

# Edit .env and add your API key
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
```

The `.env` file is automatically loaded by `mecha10 dev` and passed to all nodes.

**Alternative: Set environment variables directly**

```bash
# For OpenAI
export OPENAI_API_KEY="sk-..."

# For Claude
export ANTHROPIC_API_KEY="sk-ant-..."

# For local Ollama (no key needed)
# Ensure Ollama is running on localhost:11434
```

## Topics

### Input Topics

- **`/ai/command`** (`CommandMessage`): Natural language command from user
  ```json
  {
    "text": "move forward",
    "timestamp": 1234567890,
    "user_id": "optional_user_id"
  }
  ```

### Output Topics

- **`/ai/response`** (`ResponseMessage`): LLM response with action feedback
  ```json
  {
    "text": "Moving the robot forward",
    "timestamp": 1234567890,
    "action_taken": true,
    "error": null
  }
  ```

- **`/motor/cmd_vel`** (`MotorCommand`): Motor velocity commands
  ```json
  {
    "linear": 0.5,
    "angular": 0.0,
    "timestamp": 1234567890
  }
  ```

- **`/nav/goal`** (`NavigationGoal`): Navigation waypoint goals
  ```json
  {
    "x": 5.0,
    "y": 3.0,
    "theta": 0.0,
    "timestamp": 1234567890
  }
  ```

- **`/behavior/execute`** (`BehaviorCommand`): Behavior execution commands
  ```json
  {
    "name": "follow_person",
    "params": null,
    "timestamp": 1234567890
  }
  ```

## Command Examples

### Motor Commands
- `"move forward"``{"action": "motor", "linear": 0.5, "angular": 0.0}`
- `"turn left"``{"action": "motor", "linear": 0.0, "angular": 0.5}`
- `"stop"``{"action": "motor", "linear": 0.0, "angular": 0.0}`

### Navigation Commands
- `"go to x:5 y:3"``{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}`
- `"move to the door"` → Extracts coordinates and navigates

### Behavior Commands
- `"follow that person"``{"action": "behavior", "name": "follow_person"}`
- `"patrol the area"``{"action": "behavior", "name": "patrol"}`

### Vision Queries
The node subscribes to `/vision/detections` from the object-detector node and uses this data to answer vision questions:

- `"what do you see?"` → "I see a person (95% confidence) and a car (87% confidence)"
- `"is there a person in front of me?"` → "Yes, I detect 1 person with 95% confidence"
- `"how many cars?"` → "I see 2 cars: car (87% confidence) and car (82% confidence)"
- `"describe what's visible"` → Natural language description based on detections

**How it works:**
1. Object detector node continuously publishes detections to `/vision/detections`
2. LLM command node stores the latest detections
3. When a vision query is detected, detections are formatted as text context
4. LLM analyzes the detections and provides a natural language response

**Benefits over vision APIs:**
- **Much cheaper** - No image tokens, just structured detection data
-**Faster** - No need to encode/send images
-**More accurate** - Uses specialized YOLO model for detection

## Behavior Interruption

The LLM command node intelligently manages the interaction between user commands and autonomous behaviors:

### How It Works

1. **Automatic Interruption**: When the LLM parses a motor or navigation command, it interrupts the behavior tree
2. **User Priority**: Direct user commands always take priority over autonomous behaviors
3. **Auto-Resume**: After a timeout (configurable), the behavior tree automatically resumes
4. **Manual Resume**: Users can manually re-enable behaviors via the dashboard

### Interrupt Modes

**Disabled** (`mode = "disabled"`)
- Behavior tree is never interrupted by LLM commands
- User commands may be overridden by autonomous behaviors
- Use when you want autonomous behaviors to have priority

**Interrupt Only** (`mode = "interrupt_only"`)
- Behavior tree is paused when motor/navigation commands are issued
- **No automatic resumption** - requires manual re-enable from dashboard
- Use when you want explicit control over behavior resumption

**Interrupt with Auto-Resume** (`mode = "interrupt_with_auto_resume"`)
- Behavior tree is paused when motor/navigation commands are issued
- **Automatically resumes** after `timeout_secs` (default: 30s)
- Use for seamless switching between manual and autonomous control

### Example Scenario

```
1. Robot is running "patrol" behavior (autonomous)
2. User says: "stop" via LLM command
   → Behavior tree is interrupted
   → Motor command published: {linear: 0.0, angular: 0.0}
3. Robot stops and remains idle
4. After 30 seconds (timeout):
   → Behavior tree automatically resumes
   → Robot continues patrolling
```

### Control Messages

The system uses enhanced `BehaviorControl` messages:

```json
{
  "action": "interrupt",
  "source": "llm-command",
  "duration_secs": 30,
  "timestamp": 1234567890
}
```

Actions:
- **`interrupt`**: Pause behavior tree (from LLM command)
- **`resume`**: Resume behavior tree (manual or auto)
- **`enable`**: Enable behavior tree (from dashboard)
- **`disable`**: Disable behavior tree (from dashboard)

## System Prompt

The default system prompt guides the LLM to parse commands into structured JSON actions:

```
You are a helpful robot assistant. Parse user commands and respond with structured actions.

For navigation commands (e.g., "go to the door", "move to coordinates"), extract the goal and respond with JSON:
{"action": "navigate", "goal": {"x": 5.0, "y": 3.0, "theta": 0.0}}

For motor commands (e.g., "move forward", "turn left", "stop"), respond with JSON:
{"action": "motor", "linear": 0.5, "angular": 0.0}

For behavior commands (e.g., "follow that person", "patrol the area"), respond with JSON:
{"action": "behavior", "name": "follow_person"}

For vision queries (e.g., "what do you see?"), describe what's visible in the camera feed.

For general questions, respond conversationally.
```

You can customize this prompt in the configuration.

## Dashboard Integration

The dashboard provides a user-friendly interface for:

1. **Command Input**: Text field for natural language commands
2. **Command History**: Shows past commands with status indicators
3. **Response Display**: Shows LLM responses and action feedback
4. **Status Badges**: Connection status and processing indicators

Access the dashboard at `http://localhost:3000/dashboard/robot-control`

## Architecture

```
┌─────────────────┐
│   Dashboard UI  │
│ (Command Input) │
└────────┬────────┘
         │ publishes
    /ai/command
┌──────────────────┐
│ OpenAI Reasoning │
│      Node        │
│                  │
│  ┌────────────┐  │
│  │ LlmNode    │  │
│  │ (mecha10-  │  │
│  │  ai-llm)   │  │
│  └────────────┘  │
│         │        │
│    Parse JSON    │
│         │        │
└─────────┼────────┘
    ┌─────┴─────┬──────────────┬───────────────┐
    ▼           ▼              ▼               ▼
/motor/cmd_vel /nav/goal  /behavior/execute /ai/response
    │           │              │               │
    ▼           ▼              ▼               ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌─────────┐
│ Motor  │ │Navigation│ │  Behavior   │ │Dashboard│
│ Driver │ │  Stack   │ │  Executor   │ │   UI    │
└────────┘ └──────────┘ └─────────────┘ └─────────┘
```

## Dependencies

- **mecha10-core**: Framework core (Context, Topic, Message)
- **mecha10-ai-llm**: LLM integration library (providers, LlmNode)
- **tokio**: Async runtime
- **serde/serde_json**: Serialization
- **anyhow**: Error handling
- **reqwest**: HTTP client (for API calls)

## Running

The node is launched automatically by `mecha10 dev` when included in `mecha10.json`.

To run manually:
```bash
cargo run -p mecha10-nodes-llm-command
```

## Testing

Test the node with simulation:

1. Start control plane and simulation:
   ```bash
   docker compose up -d
   mecha10 dev
   ```

2. Send a test command via dashboard or Redis CLI:
   ```bash
   redis-cli PUBLISH "/ai/command" '{"text":"move forward","timestamp":1234567890}'
   ```

3. Subscribe to response topic:
   ```bash
   redis-cli SUBSCRIBE "/ai/response"
   redis-cli SUBSCRIBE "/motor/cmd_vel"
   ```

## Limitations

- **Vision queries not yet supported**: Camera frame integration pending
- **No conversation context**: Each command is processed independently
- **API rate limits**: Subject to provider rate limits (OpenAI, Claude)
- **Network latency**: Response time depends on LLM API latency

## Future Enhancements

- [ ] Vision query support (integrate camera feed)
- [ ] Conversation context (multi-turn dialogue)
- [ ] Voice input integration
- [ ] Command validation and safety checks
- [ ] Multi-language support
- [ ] Offline fallback mode

## License

MIT