simplify_baml 0.2.0

Simplified BAML runtime for structured LLM outputs using native Rust types with macros
Documentation
# Streaming Support

This document explains streaming with partial JSON parsing.

## The Problem

When streaming LLM responses, structure changes on every update:

```javascript
// Chunk 1: {"name": "John"         → UI shows: Name only
// Chunk 2: {"name": "John", "age": 30  → UI shows: Name + Age (layout shift!)
```

## The Solution: Schema-Aware Streaming

Since we know the IR schema upfront, we can maintain consistent structure:

```javascript
// Initial: {"name": null, "age": null, "state": "pending"}
// Chunk 1: {"name": "John", "age": null, "state": "partial"}
// Final:   {"name": "John", "age": 30, "state": "complete"}
```

**Benefits:** No layout shifts, easy loading states, simpler UI code.

## API

### Create Skeleton

```rust
use simplify_baml::*;

let target_type = FieldType::Class("Person".to_string());
let mut streaming = StreamingBamlValue::from_ir_skeleton(&ir, &target_type);
// Initial: all fields null, state "pending"
```

### Update as Data Arrives

```rust
let mut accumulated = String::new();

while let Some(chunk) = stream.next().await {
    accumulated.push_str(&chunk?);
    
    update_streaming_response(
        &mut streaming,
        &ir,
        &accumulated,
        &target_type,
        false  // not final
    )?;
    
    send_to_ui(&streaming);  // Always full structure!
}

// Mark complete
update_streaming_response(&mut streaming, &ir, &accumulated, &target_type, true)?;
```

### UI Rendering (React Example)

```javascript
function PersonCard({ streaming }) {
  return (
    <div>
      <Field value={streaming.value.name} loading={streaming.value.name === null} />
      <Field value={streaming.value.age} loading={streaming.value.age === null} />
      <SubmitButton disabled={streaming.state !== "complete"} />
    </div>
  );
}
```

## Three Streaming Approaches

### 1. Accumulate Then Parse (Simplest)

```rust
let full_response = accumulate_all_chunks(&mut stream).await;
let result = parse_llm_response_with_ir(&ir, &full_response, &target_type)?;
```

**Use for:** CLI tools, small responses.

### 2. Partial Parsing

```rust
while let Some(chunk) = stream.next().await {
    accumulated.push_str(&chunk?);
    
    if let Some(partial) = try_parse_partial_response(&ir, &accumulated, &target_type)? {
        println!("Got: {:?}", partial);  // Structure may grow
    }
}
```

**Use for:** Backend processing, logs.

### 3. Schema-Aware (Recommended)

```rust
let mut streaming = StreamingBamlValue::from_ir_skeleton(&ir, &target_type);

while let Some(chunk) = stream.next().await {
    accumulated.push_str(&chunk?);
    update_streaming_response(&mut streaming, &ir, &accumulated, &target_type, false)?;
    send_to_ui(&streaming);  // Always consistent structure
}
```

**Use for:** Web UIs, mobile apps, dashboards.

## Key Types

### `StreamingBamlValue`

```rust
pub struct StreamingBamlValue {
    pub value: BamlValue,
    pub completion_state: CompletionState,  // Pending | Partial | Complete
}
```

Serializes to:
```json
{
  "value": { "name": "John", "age": null },
  "state": "partial"
}
```

## Partial JSON Parser

The parser handles incomplete JSON from streaming:

```rust
let partial = r#"{"name": "John", "age": 30"#;  // Missing }
let json = try_parse_partial_json(partial)?;     // Auto-closes!
```

**Features:**
- Auto-closes incomplete objects/arrays
- Handles incomplete strings
- Extracts JSON from markdown code blocks
- Returns `None` if too incomplete

## Examples

```bash
cargo run --example streaming_with_schema_structure  # Recommended
cargo run --example streaming_with_partial_parsing   # Basic
cargo run --example standalone_functions             # Manual control
```

## Known Limitations

### Deeply Nested Partial JSON with Multiple Arrays

The partial JSON parser may fail to parse very complex incomplete structures that combine deep nesting (3+ levels) with multiple incomplete nested arrays.

**Works:**
```rust
// 4 levels of nested objects
r#"{"a": {"b": {"c": {"d": {"value": 42"#  // ✓ Parses correctly

// Objects inside arrays
r#"{"data": {"items": [{"name": "first"}, {"name": "second"#  // ✓ Parses correctly
```

**May fail:**
```rust
// Deeply nested with incomplete nested array inside object inside array
r#"{"data": {"items": [{"nested": {"arr": [1, 2, 3"#  // ✗ Too complex
```

**Workaround:** The parser uses a brace/bracket counting strategy that works well for most streaming scenarios. If you encounter parsing issues with deeply nested structures, wait for more complete data before parsing.

## Code Size

- `streaming_value.rs`: ~330 lines
- `partial_parser.rs`: ~310 lines
- Total: ~640 lines (vs 5000+ in full BAML)