datalab-cli 0.1.0

A powerful CLI for converting, extracting, and processing documents using the Datalab API
Documentation
# Output Formats

Understanding how the Datalab CLI outputs data, progress, and errors.

---

## Overview

The CLI uses a structured output approach:

| Stream | Content | Format |
|--------|---------|--------|
| **stdout** | Results | JSON |
| **stderr** | Progress, errors | JSON (progress) or colored text (errors) |

This design enables:

- Piping results to other tools
- Monitoring progress in real-time
- Separating data from diagnostics

---

## stdout: Result Data

All command results are output to stdout as JSON:

```bash
datalab convert document.pdf
```

```json
{
  "content": "# Document Title\n\nDocument content...",
  "metadata": {
    "pages": 5,
    "processing_time": 2.3
  }
}
```

### Piping to Other Tools

```bash
# Extract content with jq
datalab convert document.pdf | jq -r '.content'

# Save to file
datalab convert document.pdf > result.json

# Pipe to another command
datalab convert document.pdf | process-markdown
```

### Using --output Flag

Write directly to a file:

```bash
datalab convert document.pdf --output result.json
```

For binary outputs (filled forms, created documents), the file is written directly:

```bash
datalab fill form.pdf --fields data.json --output filled.pdf
```

---

## stderr: Progress Events

When running interactively (TTY), progress events are streamed to stderr as JSON:

```json
{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"upload","bytes_sent":1048576,"total_bytes":2097152}
{"type":"submit","request_id":"abc123"}
{"type":"poll","status":"processing","elapsed_secs":0.5}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}
```

### Progress Event Types

| Type | Fields | Description |
|------|--------|-------------|
| `start` | `operation`, `file` | Operation started |
| `upload` | `bytes_sent`, `total_bytes` | File upload progress |
| `submit` | `request_id` | Request submitted to API |
| `poll` | `status`, `elapsed_secs` | Polling for completion |
| `cache_hit` | `cache_key` | Result found in cache |
| `complete` | `elapsed_secs` | Operation completed |
| `error` | `code`, `message` | Error occurred |

### TTY Detection

Progress behavior is automatic:

| Context | Progress Output |
|---------|-----------------|
| Interactive terminal | Shown on stderr |
| Piped to another command | Hidden |
| Redirected to file | Hidden |

### Controlling Progress

| Flag | Effect |
|------|--------|
| `--quiet` / `-q` | Always suppress progress |
| `--verbose` / `-v` | Always show progress |

```bash
# Suppress progress
datalab -q convert document.pdf

# Force progress even when piped
datalab -v convert document.pdf | jq '.content'
```

---

## stderr: Error Messages

### Interactive Mode (TTY)

Errors are displayed with colors and suggestions:

```
error: Missing API key
hint: Set your API key:
  export DATALAB_API_KEY="your-api-key"
help: https://www.datalab.to/app/keys
```

Color codes:

- **Red**: Error message
- **Yellow**: Hint/suggestion
- **Cyan**: Help URL

### Non-Interactive Mode (Piped)

Errors are output as JSON:

```json
{"error":"Missing API key","code":"MISSING_API_KEY"}
```

### Disabling Colors

Set the `NO_COLOR` environment variable:

```bash
export NO_COLOR=1
datalab convert document.pdf
```

---

## Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Success |
| `1` | Error |

### Using in Scripts

```bash
if datalab convert document.pdf > result.json; then
    echo "Success"
    process_result result.json
else
    echo "Failed"
    exit 1
fi
```

### Capturing Exit Code

```bash
datalab convert document.pdf
exit_code=$?
echo "Exit code: $exit_code"
```

---

## Document Output Formats

The `convert` command supports multiple output formats:

### Markdown (default)

```bash
datalab convert document.pdf --output-format markdown
```

```json
{
  "content": "# Title\n\n## Section 1\n\nParagraph text...",
  "metadata": {...}
}
```

### HTML

```bash
datalab convert document.pdf --output-format html
```

```json
{
  "content": "<h1>Title</h1><h2>Section 1</h2><p>Paragraph text...</p>",
  "metadata": {...}
}
```

### JSON

```bash
datalab convert document.pdf --output-format json
```

```json
{
  "blocks": [
    {"type": "heading", "level": 1, "text": "Title"},
    {"type": "paragraph", "text": "Content..."}
  ],
  "metadata": {...}
}
```

### Chunks

Semantic chunks for RAG applications:

```bash
datalab convert document.pdf --output-format chunks
```

```json
{
  "chunks": [
    {
      "content": "Section content...",
      "metadata": {"page": 1, "section": "Introduction"}
    }
  ],
  "metadata": {...}
}
```

---

## Combining stdout and stderr

### Separate Handling

```bash
# Results to file, progress to terminal
datalab convert document.pdf > result.json

# Results to stdout, progress hidden
datalab convert document.pdf 2>/dev/null

# Both to separate files
datalab convert document.pdf > result.json 2> progress.log
```

### Merging Streams

```bash
# Merge stderr into stdout (not recommended)
datalab convert document.pdf 2>&1
```

---

## Agent-Friendly Patterns

### Parse Results with jq

```bash
# Get specific fields
datalab extract invoice.pdf --schema schema.json | jq '.total'

# Format for display
datalab convert document.pdf | jq -r '.content'
```

### Monitor Progress

```bash
# Watch progress in real-time
datalab -v convert large-document.pdf 2>&1 | while read line; do
    echo "$line" | jq -r '.type // empty'
done
```

### Error Handling

```bash
result=$(datalab convert document.pdf 2>&1)
if [ $? -eq 0 ]; then
    echo "$result" | jq '.content'
else
    echo "Error: $result"
fi
```

---

## See Also

- [Agent Integration Tutorial]../tutorials/agent-integration.md
- [Errors Reference]../reference/errors.md
- [Exit Codes Reference]../reference/exit-codes.md