langsmith-rust 0.1.3

# LangSmith Rust - Comprehensive Documentation

## Purpose

This crate provides a Rust implementation for manual tracing to LangSmith, enabling observability for AI agent systems built with Rust. It is designed to integrate seamlessly with LangGraph-style architectures where nodes execute functions and state flows through a graph.

## Core Concepts

### Tracing

Tracing is the process of recording execution metadata (inputs, outputs, timing, errors) and sending it to LangSmith for visualization and analysis. Each execution unit (node, function, operation) becomes a "run" in LangSmith.

### Runs

A "run" represents a single execution unit:
- **Root Run**: The top-level execution (e.g., entire graph execution)
- **Child Run**: A nested execution (e.g., LLM call within a chain)
- **Run Hierarchy**: Parent-child relationships form a tree structure

### Context Propagation

Context (trace_id, parent_run_id, dotted_order) flows from parent to child, maintaining the execution hierarchy. This allows LangSmith to visualize the complete execution tree.

## Architecture Overview

The crate follows a modular architecture with clear separation of concerns:

1. **Configuration Layer** (`config/`): Loads settings from environment variables
2. **Client Layer** (`client/`): Handles HTTP communication with LangSmith API
3. **Model Layer** (`models/`): Defines data structures (Run, Message, Metrics)
4. **Tracing Layer** (`tracing/`): Core tracing logic and context management
5. **Strategy Layer** (`strategies/`): Pluggable strategies for tracing and serialization
6. **Factory Layer** (`factories/`): Factory methods for creating tracers
7. **Observability Layer** (`observability/`): Observer pattern for node observation
8. **Utility Layer** (`utils/`): Helper functions for serialization and validation

## Key Components

### Tracer

The `Tracer` struct is the main interface for tracing. It wraps a `Run` and provides methods to:
- Create child tracers
- Post runs to LangSmith
- Update runs with outputs
- Handle errors

**Creation**:
```rust
let tracer = Tracer::new("node_name", RunType::Llm, json!({"input": "..."}));
```

**Usage**:
```rust
tracer.post().await?;  // Send initial run
// ... execute function ...
tracer.end(json!({"output": "..."}));  // Mark as finished
tracer.patch().await?;  // Send updates
```

### trace_node Helper

The `trace_node` function automatically wraps a function with tracing:

```rust
let result = trace_node(
    "function_name",
    RunType::Llm,
    input_data,
    |input| async move {
        // Your function logic
        process(input).await
    }
).await?;
```

This automatically:
1. Creates a tracer
2. Posts the run (with inputs)
3. Executes the function
4. Updates the run (with outputs or error)

### TracerFactory

Factory for creating tracers with different configurations:

```rust
// Create root tracer
let root = TracerFactory::create_root("Root", RunType::Chain, json!({}));

// Create with thread context
let tracer = TracerFactory::create_with_thread(
    "Node",
    RunType::Llm,
    json!({}),
    "thread-123".to_string()
);

// Create for graph node with parent context
let tracer = TracerFactory::create_for_node(
    "Node",
    RunType::Llm,
    json!({}),
    Some(&parent_context)
);
```

## Data Flow

### Initialization

1. Application calls `langsmith_rust::init()`
2. Loads `.env` file (if exists)
3. Reads environment variables:
   - `LANGSMITH_TRACING` (enables/disables tracing)
   - `LANGSMITH_ENDPOINT` (API endpoint)
   - `LANGSMITH_API_KEY` (authentication)
   - `LANGSMITH_PROJECT` (project name)
   - `LANGSMITH_TENANT_ID` (optional workspace ID)

### Tracing Execution

1. **Create Tracer**: User creates a `Tracer` with name, type, and inputs
2. **Post Run**: Tracer posts initial run to LangSmith (POST /runs)
   - Generates UUID for run_id
   - Sets trace_id (same as run_id for root)
   - Generates dotted_order for ordering
   - Sets start_time
3. **Execute Function**: User's function executes
4. **Update Run**: Tracer updates run with outputs (PATCH /runs/{id})
   - Sets outputs
   - Sets end_time
   - Sets error (if any)

### Hierarchical Tracing

When creating child runs:
1. Child inherits `trace_id` from parent
2. Child sets `parent_run_id` to parent's `run_id`
3. Child generates `dotted_order` by appending to parent's `dotted_order`
4. This creates a tree structure in LangSmith

## Run Types

Different execution types map to different `RunType` values:

- `Chain`: Orchestrator/coordinator nodes
- `Llm`: LLM API calls
- `Tool`: Tool/function executions
- `Retriever`: Retrieval operations
- `Embedding`: Embedding generation
- `Prompt`: Prompt execution
- `Runnable`: Generic runnable
- `Custom(String)`: Custom types

## Message Types

The crate supports LangChain-compatible message types:

- `AIMessage`: AI responses (may include tool_calls)
- `ToolMessage`: Tool execution results
- `HumanMessage`: Human/user inputs
- `SystemMessage`: System prompts

These types ensure compatibility with LangChain's data model.

## Serialization

All data sent to LangSmith must be JSON objects. The crate automatically:
- Wraps primitive types (String, i32, bool) in objects: `{"input": value}`
- Preserves objects as-is
- Ensures `inputs` and `outputs` are always objects

## Error Handling

Tracing errors are **non-fatal**:
- Errors are logged to stderr
- Application execution continues normally
- Tracing failures don't break your code

This ensures tracing is truly non-intrusive.

## Thread Safety

- `Config`: Thread-safe singleton
- `Tracer`: Not thread-safe (use within single async task)
- `LangSmithClient`: Thread-safe (uses Arc internally)
- `Observer`: Thread-safe (uses Arc<dyn Observer>)

## Performance

- **Non-blocking**: All HTTP calls are async
- **Lazy initialization**: Config loaded only when needed
- **Efficient serialization**: Uses serde_json
- **Minimal overhead**: <1ms per node

## Design Patterns Used

1. **Strategy Pattern**: Pluggable tracing and serialization strategies
2. **Factory Pattern**: Centralized tracer creation
3. **Observer Pattern**: Observable nodes for additional observability
4. **Singleton Pattern**: Global configuration access

## Integration Points

### With LangGraph-style Systems

1. Wrap node execution with `trace_node`
2. Use `TracerFactory` to create tracers with context
3. Propagate `TraceContext` through graph execution
4. Use appropriate `RunType` for each node type

### With Async Runtimes

Works with any async runtime (Tokio, async-std, etc.) via async/await.

### With Serialization

Requires `Serialize` trait for inputs/outputs. Works with:
- serde_json::Value
- Custom structs implementing Serialize
- Primitives (automatically wrapped)

## Configuration

All configuration via environment variables:

```bash
LANGSMITH_TRACING=true              # Enable/disable tracing
LANGSMITH_ENDPOINT=https://...     # API endpoint
LANGSMITH_API_KEY=sk-...           # API key (required)
LANGSMITH_PROJECT=my-project       # Project name
LANGSMITH_TENANT_ID=...            # Optional workspace ID
```

## API Endpoints

The crate interacts with LangSmith API:

- `POST /runs` - Create a new run
- `PATCH /runs/{run_id}` - Update an existing run

## Run Data Structure

A run contains:
- `id`: Unique identifier (UUID)
- `name`: Human-readable name
- `run_type`: Type of run (Llm, Tool, etc.)
- `inputs`: Input data (JSON object)
- `outputs`: Output data (JSON object, optional)
- `start_time`: When execution started
- `end_time`: When execution ended (optional)
- `trace_id`: Root trace identifier
- `parent_run_id`: Parent run identifier (optional)
- `dotted_order`: Ordering string for hierarchy
- `thread_id`: Conversation/thread identifier
- `session_name`: Session/project name
- `error`: Error message (optional)
- `tags`: Tags for filtering
- `extra`: Additional metadata
- Metrics: `prompt_tokens`, `completion_tokens`, `total_tokens`, costs

## Usage Patterns

### Pattern 1: Simple Function Tracing

```rust
let result = trace_node("my_function", RunType::Runnable, input, my_function).await?;
```

### Pattern 2: Manual Tracing

```rust
let mut tracer = Tracer::new("node", RunType::Llm, json!({"input": "..."}));
tracer.post().await?;
let output = execute_function().await?;
tracer.end(json!({"output": output}));
tracer.patch().await?;
```

### Pattern 3: Hierarchical Tracing

```rust
let mut parent = Tracer::new("parent", RunType::Chain, json!({}));
parent.post().await?;

let mut child = parent.create_child("child", RunType::Llm, json!({}));
child.post().await?;
// ... execute ...
child.end(json!({}));
child.patch().await?;

parent.end(json!({}));
parent.patch().await?;
```

### Pattern 4: Graph Node Integration

```rust
pub async fn execute_node(node: &Node, state: State) -> Result<State> {
    trace_node(
        &node.name,
        node.run_type.clone(),
        json!(state),
        |state| async move {
            node.process(state).await
        }
    ).await
}
```

## Extension Points

### Adding Custom Run Types

Extend `RunType` enum in `models/run.rs`.

### Adding Custom Strategies

Implement `TracingStrategy` or `SerializationStrategy` traits.

### Adding Custom Observers

Implement `Observer` trait and attach to `ObservableNodeWrapper`.

## Testing

The crate includes comprehensive tests:
- Unit tests for each module
- Integration tests for end-to-end flows
- Tests can run with tracing disabled (set `LANGSMITH_TRACING=false`)

## Limitations

1. No batching yet (future feature)
2. No retry logic (future feature)
3. Requires async runtime
4. Inputs/outputs must be serializable

## Future Enhancements

- Batch tracing for better performance
- Retry logic for failed requests
- Metrics aggregation
- Custom transport layers
- Synchronous tracing without runtime

## See Also

- `ARCHITECTURE.md` - Detailed architecture documentation
- `INTEGRATION.md` - Integration guide for LangGraph
- `READING_GUIDE.md` - Guide to understanding the codebase
- Examples in `examples/` directory