datalab-cli 0.1.0

A powerful CLI for converting, extracting, and processing documents using the Datalab API
Documentation
# Quickstart

Get up and running with the Datalab CLI in 5 minutes.

---

## Prerequisites

- [Datalab CLI installed]installation.md
- A Datalab API key ([get one here]https://www.datalab.to/app/keys)

---

## Step 1: Set Your API Key

Export your API key as an environment variable:

```bash
export DATALAB_API_KEY="your-api-key-here"
```

!!! tip "Make it permanent"
    Add the export line to your shell profile (`~/.bashrc`, `~/.zshrc`, etc.) to persist across sessions.

---

## Step 2: Convert Your First Document

Convert a PDF to markdown:

```bash
datalab convert document.pdf
```

The output is JSON containing the converted content:

```json
{
  "content": "# Document Title\n\nThis is the converted content...",
  "metadata": {
    "pages": 5,
    "processing_time": 2.3
  }
}
```

### Save to a File

Write output to a file:

```bash
datalab convert document.pdf --output result.json
```

Or use shell redirection:

```bash
datalab convert document.pdf > result.json
```

---

## Step 3: Try Different Output Formats

The CLI supports multiple output formats:

=== "Markdown (default)"

    ```bash
    datalab convert document.pdf --output-format markdown
    ```

=== "HTML"

    ```bash
    datalab convert document.pdf --output-format html
    ```

=== "JSON"

    ```bash
    datalab convert document.pdf --output-format json
    ```

=== "Chunks"

    ```bash
    datalab convert document.pdf --output-format chunks
    ```

---

## Step 4: Extract Structured Data

Extract specific fields using a JSON schema:

```bash
datalab extract invoice.pdf --schema '{
  "fields": [
    {"name": "invoice_number", "type": "string"},
    {"name": "total", "type": "number"},
    {"name": "date", "type": "string"}
  ]
}'
```

Output:

```json
{
  "invoice_number": "INV-2024-001",
  "total": 1250.00,
  "date": "2024-01-15"
}
```

---

## Step 5: Fill a Form

Fill PDF forms with data:

```bash
datalab fill application.pdf \
  --fields '{"name": "John Doe", "email": "john@example.com"}' \
  --output filled.pdf
```

---

## Understanding the Output

### stdout vs stderr

- **stdout**: JSON result data (for piping)
- **stderr**: Progress events (for monitoring)

```bash
# Pipe results to jq
datalab convert document.pdf | jq '.content'

# Save results, see progress
datalab convert document.pdf > result.json
```

### Progress Events

When running interactively, you'll see progress on stderr:

```json
{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"submit","request_id":"abc123"}
{"type":"poll","status":"processing","elapsed_secs":0.5}
{"type":"complete","elapsed_secs":3.4}
```

---

## Caching

Results are cached locally to save API costs:

```bash
# First run: calls API
datalab convert document.pdf

# Second run: returns cached result instantly
datalab convert document.pdf

# Force fresh processing
datalab convert document.pdf --skip-cache
```

See cache statistics:

```bash
datalab cache stats
```

---

## Common Operations

### Convert with High Quality

Use accurate mode for complex documents:

```bash
datalab convert report.pdf --mode accurate
```

### Process Specific Pages

```bash
datalab convert book.pdf --page-range "0-10"
```

### Convert from URL

```bash
datalab convert https://example.com/document.pdf
```

### Suppress Progress Output

For scripts, use quiet mode:

```bash
datalab -q convert document.pdf
```

---

## Next Steps

- [Configure environment variables]configuration.md
- [Deep dive into document conversion]../tutorials/convert-documents.md
- [Learn about structured extraction]../tutorials/extract-data.md
- [Explore all commands]../commands/index.md