# Caching
The Datalab CLI includes built-in caching to reduce API costs and speed up repeated operations.
---
## How It Works
When you run a command, the CLI:
1. **Generates a cache key** from the file contents, endpoint, and parameters
2. **Checks the local cache** for a matching entry
3. If found, **returns the cached result** immediately
4. If not found, **calls the API** and caches the response
```mermaid
flowchart LR
A[Command] --> B{Cache Hit?}
B -->|Yes| C[Return Cached]
B -->|No| D[Call API]
D --> E[Cache Response]
E --> F[Return Result]
```
---
## Cache Location
The cache is stored in your system's cache directory:
| Linux | `~/.cache/datalab/` |
| macOS | `~/Library/Caches/datalab/` |
| Windows | `%LOCALAPPDATA%\datalab\cache\` |
### Directory Structure
```
~/.cache/datalab/
├── responses/ # JSON API responses
│ ├── a1b2c3d4.json # Cached response
│ ├── e5f6g7h8.json
│ └── ...
└── files/ # Binary files (filled forms, created documents)
├── i9j0k1l2.pdf
└── ...
```
---
## Cache Key Generation
Cache keys are SHA256 hashes computed from:
| File hash | SHA256 of file contents (for local files) |
| URL | Full URL (for remote files) |
| Endpoint | API endpoint name (e.g., `convert`, `extract`) |
| Parameters | Sorted JSON of all request parameters |
This ensures:
- **Same file + same options** → Cache hit
- **Same file + different options** → Cache miss
- **Modified file** → Cache miss (different hash)
### Example
```bash
# These commands have different cache keys:
datalab convert doc.pdf --output-format markdown
datalab convert doc.pdf --output-format html
datalab convert doc.pdf --output-format markdown --mode accurate
```
---
## Bypassing the Cache
### Skip Local Cache
Use `--skip-cache` to ignore the local cache:
```bash
datalab convert document.pdf --skip-cache
```
This still uses the API's server-side cache.
### Force Reprocessing
Use `--force` to bypass the API's server-side cache:
```bash
datalab convert document.pdf --force
```
This still checks the local cache first.
### Skip Both Caches
Combine both flags for fully fresh processing:
```bash
datalab convert document.pdf --skip-cache --force
```
---
## Managing the Cache
### View Statistics
```bash
datalab cache stats
```
Output:
```json
{
"cache_dir": "/home/user/.cache/datalab",
"response_count": 150,
"response_size": 52428800,
"file_count": 10,
"file_size": 314572800
}
```
### Clear All Cache
```bash
datalab cache clear
```
### Clear Old Entries
Remove entries older than a specified number of days:
```bash
# Clear entries older than 7 days
datalab cache clear --older-than 7
# Clear entries older than 30 days
datalab cache clear --older-than 30
```
---
## Cache Metadata
Each cached response includes metadata:
```json
{
"created_at": "2024-01-15T10:30:00Z",
"endpoint": "convert",
"params_hash": "abc123...",
"file_hash": "def456...",
"file_path": "/path/to/original.pdf"
}
```
---
## Best Practices
### During Development
Keep caching enabled to minimize API costs:
```bash
# First run: API call (~$0.01)
datalab convert document.pdf
# Subsequent runs: cached (free!)
datalab convert document.pdf
```
### For Production Pipelines
Consider cache management strategies:
```bash
# Option 1: Scheduled cleanup
0 0 * * 0 datalab cache clear --older-than 7
# Option 2: Fresh processing for critical documents
datalab convert important.pdf --skip-cache --force
```
### For Testing
Bypass cache to ensure consistent results:
```bash
datalab convert test.pdf --skip-cache --force
```
---
## Cache vs. Checkpoints
| Stored | Locally | On Datalab servers |
| Purpose | Reduce API calls | Reuse parsed documents |
| Scope | Full response | Document parse state |
| Duration | Until cleared | Server-defined retention |
| Cost | Free (local storage) | Included in API usage |
Use **cache** to avoid repeating identical requests.
Use **checkpoints** to efficiently run multiple operations on the same document.
---
## Troubleshooting
### Cache Not Working
If results aren't being cached:
1. Check cache directory exists and is writable
2. Verify you're not using `--skip-cache`
3. Check available disk space
### Stale Results
If you're getting outdated results:
```bash
# Clear and retry
datalab cache clear
datalab convert document.pdf
```
### Cache Too Large
If the cache is using too much disk space:
```bash
# Check size
datalab cache stats
# Clear old entries
datalab cache clear --older-than 7
# Or clear everything
datalab cache clear
```
---
## See Also
- [cache command](../commands/cache.md)
- [Checkpoints](checkpoints.md)