# Creating and Running Workflows
Learn how to create multi-step document processing pipelines using workflows.
---
## Prerequisites
- [Datalab CLI installed](../getting-started/installation.md)
- [API key configured](../getting-started/configuration.md)
---
## What Are Workflows?
Workflows allow you to chain multiple processing steps together:
```mermaid
flowchart LR
A[Input Document] --> B[Step 1: Convert]
B --> C[Step 2: Extract]
C --> D[Step 3: Transform]
D --> E[Output]
```
Benefits:
- **Reusable**: Define once, run many times
- **Consistent**: Same processing for every document
- **Efficient**: Optimized execution
---
## Discovering Step Types
List available step types:
```bash
datalab workflows step-types
```
Output:
```json
{
"step_types": [
{
"type": "convert",
"description": "Convert document to structured format",
"config_schema": {...}
},
{
"type": "extract",
"description": "Extract structured data using schema",
"config_schema": {...}
},
{
"type": "segment",
"description": "Segment document into sections",
"config_schema": {...}
}
]
}
```
---
## Creating a Workflow
### Step 1: Define Workflow Steps
Create `workflow.json`:
```json
{
"steps": [
{
"type": "convert",
"config": {
"output_format": "markdown",
"mode": "balanced"
}
},
{
"type": "extract",
"config": {
"schema": {
"fields": [
{"name": "invoice_number", "type": "string"},
{"name": "total", "type": "number"},
{"name": "date", "type": "string"}
]
}
}
}
]
}
```
### Step 2: Create the Workflow
```bash
datalab workflows create --name "invoice-processor" --steps workflow.json
```
Output:
```json
{
"workflow_id": "wf_abc123def456",
"name": "invoice-processor",
"created_at": "2024-01-15T10:30:00Z"
}
```
Save the `workflow_id` for later use.
---
## Running a Workflow
### Step 1: Create Input Configuration
Create `input.json`:
```json
{
"file_url": "https://example.com/invoice.pdf"
}
```
Or use a file ID from a previous upload:
```json
{
"file_id": "file_xyz789"
}
```
### Step 2: Execute the Workflow
```bash
datalab workflows execute wf_abc123def456 --input input.json
```
Output:
```json
{
"execution_id": "exec_ghi789jkl012",
"workflow_id": "wf_abc123def456",
"status": "running",
"started_at": "2024-01-15T10:35:00Z"
}
```
### Step 3: Check Execution Status
```bash
datalab workflows execution exec_ghi789jkl012
```
Output (running):
```json
{
"execution_id": "exec_ghi789jkl012",
"status": "running",
"current_step": 1,
"total_steps": 2
}
```
Output (completed):
```json
{
"execution_id": "exec_ghi789jkl012",
"status": "completed",
"completed_at": "2024-01-15T10:35:30Z",
"results": {
"step_0": {
"type": "convert",
"output": {...}
},
"step_1": {
"type": "extract",
"output": {
"invoice_number": "INV-2024-001",
"total": 1250.00,
"date": "2024-01-15"
}
}
}
}
```
---
## Workflow Examples
### Invoice Processing Pipeline
```json
{
"steps": [
{
"type": "convert",
"config": {
"output_format": "json",
"mode": "accurate"
}
},
{
"type": "extract",
"config": {
"schema": {
"fields": [
{"name": "vendor_name", "type": "string"},
{"name": "invoice_number", "type": "string"},
{"name": "invoice_date", "type": "string"},
{"name": "due_date", "type": "string"},
{
"name": "line_items",
"type": "array",
"items": {
"type": "object",
"fields": [
{"name": "description", "type": "string"},
{"name": "quantity", "type": "number"},
{"name": "unit_price", "type": "number"},
{"name": "amount", "type": "number"}
]
}
},
{"name": "subtotal", "type": "number"},
{"name": "tax", "type": "number"},
{"name": "total", "type": "number"}
]
},
"include_scores": true
}
}
]
}
```
### Document Classification Pipeline
```json
{
"steps": [
{
"type": "segment",
"config": {
"schema": {
"segments": ["invoice", "receipt", "contract", "letter", "form"]
}
}
}
]
}
```
### Contract Analysis Pipeline
```json
{
"steps": [
{
"type": "convert",
"config": {
"output_format": "markdown",
"mode": "accurate",
"extras": ["extract_links"]
}
},
{
"type": "extract",
"config": {
"schema": {
"fields": [
{"name": "contract_type", "type": "string"},
{"name": "parties", "type": "array"},
{"name": "effective_date", "type": "string"},
{"name": "termination_date", "type": "string"},
{"name": "key_terms", "type": "array"},
{"name": "governing_law", "type": "string"}
]
}
}
}
]
}
```
---
## Managing Workflows
### List All Workflows
```bash
datalab workflows list
```
### Get Workflow Details
```bash
datalab workflows get wf_abc123def456
```
### Delete a Workflow
```bash
datalab workflows delete wf_abc123def456
```
---
## Using File Uploads with Workflows
For local files, upload first:
```bash
# 1. Upload the file
datalab files upload invoice.pdf
# Returns: file_id: "file_xyz789"
# 2. Create input with file_id
echo '{"file_id": "file_xyz789"}' > input.json
# 3. Execute workflow
datalab workflows execute wf_abc123def456 --input input.json
```
---
## Batch Processing with Workflows
Process multiple documents with the same workflow:
```bash
#!/bin/bash
workflow_id="wf_abc123def456"
for file in documents/*.pdf; do
echo "Processing $file..."
# Upload file
upload_result=$(datalab files upload "$file")
file_id=$(echo "$upload_result" | jq -r '.file_id')
# Create input config
echo "{\"file_id\": \"$file_id\"}" > /tmp/input.json
# Execute workflow
exec_result=$(datalab workflows execute "$workflow_id" --input /tmp/input.json)
exec_id=$(echo "$exec_result" | jq -r '.execution_id')
echo "Execution started: $exec_id"
# Wait for completion (simple polling)
while true; do
status=$(datalab workflows execution "$exec_id" | jq -r '.status')
if [ "$status" = "completed" ] || [ "$status" = "failed" ]; then
break
fi
sleep 2
done
# Get results
datalab workflows execution "$exec_id" > "results/$(basename "$file" .pdf).json"
done
```
---
## Error Handling
### Execution Failed
Check the execution status for error details:
```bash
datalab workflows execution exec_ghi789jkl012
```
```json
{
"execution_id": "exec_ghi789jkl012",
"status": "failed",
"error": {
"step": 1,
"message": "Schema validation failed",
"code": "INVALID_SCHEMA"
}
}
```
### Common Errors
| `INVALID_SCHEMA` | Schema syntax error | Validate JSON schema |
| `FILE_NOT_FOUND` | Input file missing | Check file_id or URL |
| `STEP_FAILED` | Processing error | Check step configuration |
| `TIMEOUT` | Step took too long | Simplify step or split workflow |
---
## Best Practices
### Keep Workflows Focused
Create separate workflows for different purposes:
- `invoice-extractor` - Extract invoice data
- `contract-analyzer` - Analyze contracts
- `document-classifier` - Classify documents
### Use Descriptive Names
```bash
# Good
datalab workflows create --name "invoice-data-extraction-v2" --steps workflow.json
# Less descriptive
datalab workflows create --name "wf1" --steps workflow.json
```
### Version Your Workflows
Include version in the name or use a naming convention:
```bash
datalab workflows create --name "invoice-processor-v1" --steps workflow-v1.json
datalab workflows create --name "invoice-processor-v2" --steps workflow-v2.json
```
### Test Before Production
Test workflows with sample documents before processing large batches.
---
## Next Steps
- [workflows command reference](../commands/workflows.md)
- [files command reference](../commands/files.md)
- [Rate Limits](../concepts/rate-limits.md)