# Datalab CLI
**Convert, extract, and process documents from the command line**
[](https://crates.io/crates/datalab-cli)
[](https://github.com/dipankar/datalab-cli/actions)
[](https://github.com/dipankar/datalab-cli/blob/main/LICENSE)
[](https://crates.io/crates/datalab-cli)
---
A powerful command-line interface for the [Datalab](https://www.datalab.to) document processing API. Built in Rust for speed and reliability.
## Features
- **📄 Document Conversion** — Convert PDFs, images, and documents to Markdown, HTML, JSON, or semantic chunks
- **🔍 Structured Extraction** — Extract data using JSON schemas with confidence scores
- **📝 Form Filling** — Fill PDF forms programmatically with smart field matching
- **⚡ Smart Caching** — Local file-based caching reduces API costs on repeated requests
- **🤖 Agent-Friendly** — JSON output to stdout, progress events to stderr, designed for piping
- **📊 Progress Streaming** — Real-time JSON progress events for monitoring long operations
## Installation
### From crates.io
```bash
cargo install datalab-cli
```
### From source
```bash
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo install --path .
```
### Pre-built binaries
Download from [GitHub Releases](https://github.com/dipankar/datalab-cli/releases).
## Quick Start
**1. Get your API key** from [datalab.to/app/keys](https://www.datalab.to/app/keys)
**2. Set the environment variable**
```bash
export DATALAB_API_KEY="your-api-key"
```
**3. Convert your first document**
```bash
datalab convert document.pdf
```
That's it! The converted markdown is output as JSON to stdout.
## Usage
### Convert Documents
```bash
# Convert to markdown (default)
datalab convert document.pdf
# Convert to HTML
datalab convert document.pdf --output-format html
# High-quality mode for complex documents
datalab convert report.pdf --mode accurate
# Convert specific pages
datalab convert book.pdf --page-range "0-10"
# Save to file
datalab convert document.pdf --output result.json
```
### Extract Structured Data
```bash
# Extract with inline schema
datalab extract invoice.pdf --schema '{
"fields": [
{"name": "total", "type": "number"},
{"name": "date", "type": "string"}
]
}'
# Extract with schema file
datalab extract invoice.pdf --schema schema.json
# Include confidence scores
datalab extract invoice.pdf --schema schema.json --include-scores
```
### Fill Forms
```bash
# Fill a form
datalab fill application.pdf \
--fields '{"name": "John Doe", "email": "john@example.com"}' \
--output filled.pdf
```
### File Management
```bash
# Upload a file
datalab files upload document.pdf
# List files
datalab files list
# Download a file
datalab files download file_abc123 --output downloaded.pdf
```
### Cache Management
```bash
# View cache stats
datalab cache stats
# Clear old entries
datalab cache clear --older-than 7
```
## Output Format
All commands output JSON to **stdout** for easy piping:
```bash
# Pipe to jq
# Save to file
datalab convert document.pdf > result.json
```
Progress events stream to **stderr** as JSON:
```json
{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}
```
Use `--quiet` to suppress progress, `--verbose` to force it.
## Environment Variables
| `DATALAB_API_KEY` | Yes | Your API key |
| `DATALAB_BASE_URL` | No | Custom API endpoint (for on-prem) |
| `NO_COLOR` | No | Disable colored output |
## Caching
Results are cached locally in `~/.cache/datalab/` to reduce API costs:
```bash
# First run: calls API
datalab convert document.pdf
# Second run: instant from cache
datalab convert document.pdf
# Bypass cache
datalab convert document.pdf --skip-cache
```
## Documentation
Full documentation is available in the [documentation](./documentation) directory. To view locally:
```bash
cd documentation
pip install -r requirements.txt
mkdocs serve
```
## Contributing
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
```bash
# Development setup
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo build
# Run tests
cargo test
# Run lints
cargo clippy
cargo fmt --check
```
## License
MIT License - see [LICENSE](LICENSE) for details.
---