datalab-cli 0.1.0

# Creating and Running Workflows

Learn how to create multi-step document processing pipelines using workflows.

---

## Prerequisites

- [Datalab CLI installed](../getting-started/installation.md)
- [API key configured](../getting-started/configuration.md)

---

## What Are Workflows?

Workflows allow you to chain multiple processing steps together:

```mermaid
flowchart LR
    A[Input Document] --> B[Step 1: Convert]
    B --> C[Step 2: Extract]
    C --> D[Step 3: Transform]
    D --> E[Output]
```

Benefits:
- **Reusable**: Define once, run many times
- **Consistent**: Same processing for every document
- **Efficient**: Optimized execution

---

## Discovering Step Types

List available step types:

```bash
datalab workflows step-types
```

Output:
```json
{
  "step_types": [
    {
      "type": "convert",
      "description": "Convert document to structured format",
      "config_schema": {...}
    },
    {
      "type": "extract",
      "description": "Extract structured data using schema",
      "config_schema": {...}
    },
    {
      "type": "segment",
      "description": "Segment document into sections",
      "config_schema": {...}
    }
  ]
}
```

---

## Creating a Workflow

### Step 1: Define Workflow Steps

Create `workflow.json`:

```json
{
  "steps": [
    {
      "type": "convert",
      "config": {
        "output_format": "markdown",
        "mode": "balanced"
      }
    },
    {
      "type": "extract",
      "config": {
        "schema": {
          "fields": [
            {"name": "invoice_number", "type": "string"},
            {"name": "total", "type": "number"},
            {"name": "date", "type": "string"}
          ]
        }
      }
    }
  ]
}
```

### Step 2: Create the Workflow

```bash
datalab workflows create --name "invoice-processor" --steps workflow.json
```

Output:
```json
{
  "workflow_id": "wf_abc123def456",
  "name": "invoice-processor",
  "created_at": "2024-01-15T10:30:00Z"
}
```

Save the `workflow_id` for later use.

---

## Running a Workflow

### Step 1: Create Input Configuration

Create `input.json`:

```json
{
  "file_url": "https://example.com/invoice.pdf"
}
```

Or use a file ID from a previous upload:

```json
{
  "file_id": "file_xyz789"
}
```

### Step 2: Execute the Workflow

```bash
datalab workflows execute wf_abc123def456 --input input.json
```

Output:
```json
{
  "execution_id": "exec_ghi789jkl012",
  "workflow_id": "wf_abc123def456",
  "status": "running",
  "started_at": "2024-01-15T10:35:00Z"
}
```

### Step 3: Check Execution Status

```bash
datalab workflows execution exec_ghi789jkl012
```

Output (running):
```json
{
  "execution_id": "exec_ghi789jkl012",
  "status": "running",
  "current_step": 1,
  "total_steps": 2
}
```

Output (completed):
```json
{
  "execution_id": "exec_ghi789jkl012",
  "status": "completed",
  "completed_at": "2024-01-15T10:35:30Z",
  "results": {
    "step_0": {
      "type": "convert",
      "output": {...}
    },
    "step_1": {
      "type": "extract",
      "output": {
        "invoice_number": "INV-2024-001",
        "total": 1250.00,
        "date": "2024-01-15"
      }
    }
  }
}
```

---

## Workflow Examples

### Invoice Processing Pipeline

```json
{
  "steps": [
    {
      "type": "convert",
      "config": {
        "output_format": "json",
        "mode": "accurate"
      }
    },
    {
      "type": "extract",
      "config": {
        "schema": {
          "fields": [
            {"name": "vendor_name", "type": "string"},
            {"name": "invoice_number", "type": "string"},
            {"name": "invoice_date", "type": "string"},
            {"name": "due_date", "type": "string"},
            {
              "name": "line_items",
              "type": "array",
              "items": {
                "type": "object",
                "fields": [
                  {"name": "description", "type": "string"},
                  {"name": "quantity", "type": "number"},
                  {"name": "unit_price", "type": "number"},
                  {"name": "amount", "type": "number"}
                ]
              }
            },
            {"name": "subtotal", "type": "number"},
            {"name": "tax", "type": "number"},
            {"name": "total", "type": "number"}
          ]
        },
        "include_scores": true
      }
    }
  ]
}
```

### Document Classification Pipeline

```json
{
  "steps": [
    {
      "type": "segment",
      "config": {
        "schema": {
          "segments": ["invoice", "receipt", "contract", "letter", "form"]
        }
      }
    }
  ]
}
```

### Contract Analysis Pipeline

```json
{
  "steps": [
    {
      "type": "convert",
      "config": {
        "output_format": "markdown",
        "mode": "accurate",
        "extras": ["extract_links"]
      }
    },
    {
      "type": "extract",
      "config": {
        "schema": {
          "fields": [
            {"name": "contract_type", "type": "string"},
            {"name": "parties", "type": "array"},
            {"name": "effective_date", "type": "string"},
            {"name": "termination_date", "type": "string"},
            {"name": "key_terms", "type": "array"},
            {"name": "governing_law", "type": "string"}
          ]
        }
      }
    }
  ]
}
```

---

## Managing Workflows

### List All Workflows

```bash
datalab workflows list
```

### Get Workflow Details

```bash
datalab workflows get wf_abc123def456
```

### Delete a Workflow

```bash
datalab workflows delete wf_abc123def456
```

---

## Using File Uploads with Workflows

For local files, upload first:

```bash
# 1. Upload the file
datalab files upload invoice.pdf
# Returns: file_id: "file_xyz789"

# 2. Create input with file_id
echo '{"file_id": "file_xyz789"}' > input.json

# 3. Execute workflow
datalab workflows execute wf_abc123def456 --input input.json
```

---

## Batch Processing with Workflows

Process multiple documents with the same workflow:

```bash
#!/bin/bash
workflow_id="wf_abc123def456"

for file in documents/*.pdf; do
    echo "Processing $file..."

    # Upload file
    upload_result=$(datalab files upload "$file")
    file_id=$(echo "$upload_result" | jq -r '.file_id')

    # Create input config
    echo "{\"file_id\": \"$file_id\"}" > /tmp/input.json

    # Execute workflow
    exec_result=$(datalab workflows execute "$workflow_id" --input /tmp/input.json)
    exec_id=$(echo "$exec_result" | jq -r '.execution_id')

    echo "Execution started: $exec_id"

    # Wait for completion (simple polling)
    while true; do
        status=$(datalab workflows execution "$exec_id" | jq -r '.status')
        if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
            break
        fi
        sleep 2
    done

    # Get results
    datalab workflows execution "$exec_id" > "results/$(basename "$file" .pdf).json"
done
```

---

## Error Handling

### Execution Failed

Check the execution status for error details:

```bash
datalab workflows execution exec_ghi789jkl012
```

```json
{
  "execution_id": "exec_ghi789jkl012",
  "status": "failed",
  "error": {
    "step": 1,
    "message": "Schema validation failed",
    "code": "INVALID_SCHEMA"
  }
}
```

### Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `INVALID_SCHEMA` | Schema syntax error | Validate JSON schema |
| `FILE_NOT_FOUND` | Input file missing | Check file_id or URL |
| `STEP_FAILED` | Processing error | Check step configuration |
| `TIMEOUT` | Step took too long | Simplify step or split workflow |

---

## Best Practices

### Keep Workflows Focused

Create separate workflows for different purposes:

- `invoice-extractor` - Extract invoice data
- `contract-analyzer` - Analyze contracts
- `document-classifier` - Classify documents

### Use Descriptive Names

```bash
# Good
datalab workflows create --name "invoice-data-extraction-v2" --steps workflow.json

# Less descriptive
datalab workflows create --name "wf1" --steps workflow.json
```

### Version Your Workflows

Include version in the name or use a naming convention:

```bash
datalab workflows create --name "invoice-processor-v1" --steps workflow-v1.json
datalab workflows create --name "invoice-processor-v2" --steps workflow-v2.json
```

### Test Before Production

Test workflows with sample documents before processing large batches.

---

## Next Steps

- [workflows command reference](../commands/workflows.md)
- [files command reference](../commands/files.md)
- [Rate Limits](../concepts/rate-limits.md)