datalab-cli 0.1.0

A powerful CLI for converting, extracting, and processing documents using the Datalab API
Documentation
# Rate Limits

Understanding and working with Datalab API rate limits.

---

## API Limits

The Datalab API enforces the following limits:

| Limit | Value | Scope |
|-------|-------|-------|
| Requests per minute | 400 | Per API key |
| Concurrent requests | 400 | Per API key |
| Pages in flight | 5,000 | Across all requests |
| Max file size | 200 MB | Per request |
| Max pages per request | 7,000 | Per request |

---

## Request Limits

### Requests Per Minute

Maximum 400 requests per minute per API key.

**Example**: If you send 400 requests in 30 seconds, you'll need to wait 30 seconds before sending more.

### Concurrent Requests

Maximum 400 concurrent (in-progress) requests.

**Impact**: If you have 400 pending requests, new requests will be queued or rejected until some complete.

---

## Page Limits

### Pages In Flight

Maximum 5,000 pages being processed across all your requests.

**Example**: If you're processing a 1,000-page document and a 2,000-page document, you have 3,000 pages in flight. You can start another request with up to 2,000 pages.

### Pages Per Request

Maximum 7,000 pages in a single request.

Use `--max-pages` to limit pages:

```bash
# Process only first 100 pages
datalab convert large-book.pdf --max-pages 100
```

---

## File Size Limit

Maximum file size is 200 MB per request.

```bash
# Check file size before processing
ls -lh document.pdf
```

If your file exceeds 200 MB:

1. Split into smaller files
2. Compress images if possible
3. Process page ranges separately

---

## Rate Limit Errors

When you exceed rate limits, you'll receive an error:

### Interactive (TTY)

```
error: Rate limited
hint: You've exceeded the rate limit. Wait 30 seconds before retrying.
help: https://documentation.datalab.to/docs/common/limits
```

### JSON (Piped)

```json
{
  "error": "Rate limited. Retry after 30 seconds",
  "code": "RATE_LIMITED"
}
```

### HTTP Response

- Status code: `429 Too Many Requests`
- Header: `Retry-After: 30`

---

## Handling Rate Limits

### Automatic Retry (Recommended)

The CLI handles rate limits automatically with exponential backoff. Progress events show retry status:

```json
{"type":"poll","status":"rate_limited","elapsed_secs":5.0}
{"type":"poll","status":"retrying","elapsed_secs":35.0}
```

### Manual Retry

For scripting, check the exit code and retry:

```bash
#!/bin/bash
max_retries=3
retry_count=0

while [ $retry_count -lt $max_retries ]; do
    if datalab convert document.pdf > result.json 2>/dev/null; then
        echo "Success"
        exit 0
    fi

    retry_count=$((retry_count + 1))
    echo "Retry $retry_count of $max_retries..."
    sleep 30
done

echo "Failed after $max_retries retries"
exit 1
```

---

## Best Practices

### Batch Processing

When processing many files, add delays between requests:

```bash
for file in *.pdf; do
    datalab convert "$file" > "${file%.pdf}.json"
    sleep 0.2  # 200ms delay = ~300 requests/minute
done
```

### Use Caching

Enable caching to avoid redundant requests:

```bash
# First run: API call
datalab convert document.pdf

# Subsequent runs: cached (doesn't count toward rate limit)
datalab convert document.pdf
```

### Use Checkpoints

Save checkpoints to avoid re-parsing:

```bash
# Parse once
datalab convert document.pdf --save-checkpoint

# Reuse parse (lower cost, doesn't re-parse)
datalab extract document.pdf --schema schema1.json --checkpoint-id ...
datalab extract document.pdf --schema schema2.json --checkpoint-id ...
```

### Limit Pages

Process only the pages you need:

```bash
# Instead of full document
datalab convert book.pdf --max-pages 10

# Or specific pages
datalab convert book.pdf --page-range "0-5,10-15"
```

### Parallelize Carefully

When running parallel requests, stay within limits:

```bash
# Good: 4 parallel jobs with delays
parallel -j4 --delay 0.5 'datalab convert {} > {.}.json' ::: *.pdf

# Risky: Too many parallel jobs
parallel -j100 'datalab convert {} > {.}.json' ::: *.pdf
```

---

## Monitoring Usage

### Track Progress Events

Monitor for rate limit status:

```bash
datalab -v convert document.pdf 2>&1 | grep -i "rate"
```

### Check API Dashboard

View your usage in the [Datalab dashboard](https://www.datalab.to/app/usage).

---

## Enterprise Limits

Enterprise plans may have higher limits:

| Feature | Standard | Enterprise |
|---------|----------|------------|
| Requests/minute | 400 | Custom |
| Concurrent requests | 400 | Custom |
| Pages in flight | 5,000 | Custom |
| Max file size | 200 MB | Custom |
| Dedicated capacity | No | Yes |

Contact [sales@datalab.to](mailto:sales@datalab.to) for enterprise pricing.

---

## Troubleshooting

### "Rate limited" Errors

**Problem**: Too many requests in a short period.

**Solution**:

1. Add delays between requests
2. Reduce parallel processing
3. Use caching to avoid redundant calls

### "File too large" Errors

**Problem**: File exceeds 200 MB.

**Solution**:

1. Split the document
2. Compress images
3. Process page ranges

### "Too many pages" Errors

**Problem**: Document has more than 7,000 pages.

**Solution**:

```bash
# Process in chunks
datalab convert huge.pdf --page-range "0-5000" --output part1.json
datalab convert huge.pdf --page-range "5001-10000" --output part2.json
```

---

## See Also

- [Caching]caching.md
- [Checkpoints]checkpoints.md
- [Errors Reference]../reference/errors.md