# Checkpoints
Checkpoints allow you to reuse parsed documents across multiple operations, saving time and API costs.
---
## What Are Checkpoints?
When you process a document, the Datalab API parses and analyzes its structure. A checkpoint saves this parsed state on the server, allowing you to:
- Run multiple extractions without re-parsing
- Segment documents after conversion
- Score extraction results
```mermaid
flowchart LR
A[Document] --> B[Parse]
B --> C[Checkpoint]
C --> D[Extract 1]
C --> E[Extract 2]
C --> F[Segment]
C --> G[Score]
```
---
## Creating Checkpoints
Add `--save-checkpoint` to any command that processes a document:
```bash
# During conversion
datalab convert document.pdf --save-checkpoint
# During extraction
datalab extract document.pdf --schema schema.json --save-checkpoint
```
The response includes a `checkpoint_id`:
```json
{
"content": "...",
"metadata": {
"checkpoint_id": "ckpt_abc123def456"
}
}
```
---
## Using Checkpoints
Reference a checkpoint with `--checkpoint-id`:
```bash
# Extract using existing checkpoint
datalab extract document.pdf --schema schema.json --checkpoint-id ckpt_abc123def456
# Segment using existing checkpoint
datalab segment document.pdf --schema segments.json --checkpoint-id ckpt_abc123def456
```
---
## Checkpoint Workflow
### Example: Multiple Extractions
```bash
# 1. Convert and save checkpoint
result=$(datalab convert invoice.pdf --save-checkpoint)
checkpoint_id=$(echo "$result" | jq -r '.metadata.checkpoint_id')
# 2. Extract different data using the same checkpoint
datalab extract invoice.pdf --schema header.json --checkpoint-id "$checkpoint_id"
datalab extract invoice.pdf --schema line_items.json --checkpoint-id "$checkpoint_id"
datalab extract invoice.pdf --schema totals.json --checkpoint-id "$checkpoint_id"
```
### Example: Convert Then Segment
```bash
# 1. Convert document
datalab convert bundle.pdf --save-checkpoint
# Returns checkpoint_id: "ckpt_abc123"
# 2. Segment using checkpoint
datalab segment bundle.pdf --schema '{"segments": ["invoice", "receipt"]}' \
--checkpoint-id ckpt_abc123
```
### Example: Extract Then Score
```bash
# 1. Extract with checkpoint
datalab extract invoice.pdf --schema schema.json --save-checkpoint
# Returns checkpoint_id: "ckpt_xyz789"
# 2. Score the extraction
datalab extract-score --checkpoint-id ckpt_xyz789
```
---
## Cost Benefits
Checkpoints reduce processing costs by avoiding redundant parsing:
| Convert | Full parse | Full parse |
| Extract #1 | Full parse | Reuse parse |
| Extract #2 | Full parse | Reuse parse |
| Segment | Full parse | Reuse parse |
| **Total** | 4x parse cost | 1x parse cost |
---
## Checkpoint Retention
Checkpoints are stored on Datalab servers with the following retention:
| Free | 1 hour |
| Pro | 24 hours |
| Enterprise | Configurable |
!!! warning "Checkpoint Expiration"
Checkpoints expire after their retention period. Plan your workflow to use checkpoints within the retention window.
---
## Commands Supporting Checkpoints
### Can Create Checkpoints
| `convert` | `--save-checkpoint` |
| `extract` | `--save-checkpoint` |
| `segment` | `--save-checkpoint` |
### Can Use Checkpoints
| `extract` | `--checkpoint-id <ID>` |
| `segment` | `--checkpoint-id <ID>` |
| `extract-score` | `--checkpoint-id <ID>` |
---
## Checkpoints vs. Caching
| **Location** | Datalab servers | Your machine |
| **Purpose** | Reuse parsed document | Avoid duplicate API calls |
| **Retention** | Hours (server-defined) | Until you clear it |
| **Cross-operation** | Yes | No (same operation only) |
| **Cost** | Included in API | Free (local storage) |
### When to Use Checkpoints
- Multiple extractions from the same document
- Extract → Score workflow
- Convert → Segment workflow
- Processing the same document with different schemas
### When to Use Local Cache
- Repeated identical operations
- Development and testing
- Avoiding redundant API calls
---
## Best Practices
### Save Checkpoints Proactively
If you might need to re-process a document, save a checkpoint:
```bash
# Always save checkpoint for important documents
datalab convert document.pdf --save-checkpoint > result.json
```
### Store Checkpoint IDs
Save checkpoint IDs for later use:
```bash
# Parse and store checkpoint
# Use later
datalab extract doc.pdf --schema schema.json --checkpoint-id "$(cat checkpoint.txt)"
```
### Plan Workflows Within Retention
Ensure all checkpoint operations complete before expiration:
```bash
# Good: All operations in sequence
datalab convert doc.pdf --save-checkpoint # t=0
datalab extract doc.pdf --schema a.json --checkpoint-id ... # t=1min
datalab extract doc.pdf --schema b.json --checkpoint-id ... # t=2min
# Risk: Long delay between operations
datalab convert doc.pdf --save-checkpoint # t=0
# ... 2 hours later ...
datalab extract doc.pdf --checkpoint-id ... # May fail if expired
```
---
## Troubleshooting
### "Checkpoint not found" Error
The checkpoint may have expired. Create a new checkpoint:
```bash
datalab convert document.pdf --save-checkpoint
```
### Checkpoint Not Returned
Ensure you're using `--save-checkpoint`:
```bash
# Wrong: no checkpoint
datalab convert document.pdf
# Right: checkpoint saved
datalab convert document.pdf --save-checkpoint
```
---
## See Also
- [extract-score command](../commands/extract-score.md)
- [Caching](caching.md)
- [Extracting Data Tutorial](../tutorials/extract-data.md)