# Filling Forms
Learn how to fill PDF and image forms with data using the Datalab CLI.
---
## Prerequisites
- [Datalab CLI installed](../getting-started/installation.md)
- [API key configured](../getting-started/configuration.md)
---
## Basic Form Filling
Fill a form with inline field data:
```bash
datalab fill application.pdf \
--fields '{"name": "John Doe", "email": "john@example.com"}' \
--output filled.pdf
```
The CLI:
1. Analyzes the form to find fields
2. Matches your data to form fields
3. Fills matching fields
4. Returns the filled form
---
## Field Data Format
### Inline JSON
```bash
datalab fill form.pdf --fields '{"field_name": "value"}' --output filled.pdf
```
### From File
Create `data.json`:
```json
{
"first_name": "John",
"last_name": "Doe",
"email": "john@example.com",
"phone": "555-123-4567",
"date": "2024-01-15"
}
```
Run:
```bash
datalab fill form.pdf --fields data.json --output filled.pdf
```
---
## Field Matching
The CLI uses fuzzy matching to map your field names to form fields.
### How It Works
Your field names don't need to match exactly:
| `name` | `Full Name` | Yes |
| `email` | `Email Address` | Yes |
| `first_name` | `First Name` | Yes |
| `phone` | `Phone Number` | Yes |
| `dob` | `Date of Birth` | Yes |
### Confidence Threshold
Control matching strictness with `--confidence-threshold`:
```bash
# Lenient matching (more fields matched, possible mismatches)
datalab fill form.pdf --fields data.json --confidence-threshold 0.3 --output filled.pdf
# Strict matching (fewer matches, higher accuracy)
datalab fill form.pdf --fields data.json --confidence-threshold 0.8 --output filled.pdf
```
| 0.0 - 0.3 | Very lenient, matches loosely similar names |
| 0.4 - 0.6 | Balanced (default: 0.5) |
| 0.7 - 1.0 | Strict, requires close matches |
---
## Adding Context
Help the CLI understand your form with context:
```bash
datalab fill tax-form.pdf \
--fields '{"name": "John Doe", "ssn": "123-45-6789"}' \
--context "This is a W-2 tax form for 2024" \
--output filled.pdf
```
Good context includes:
- Form type or purpose
- Year or time period
- Organization name
- Any disambiguation hints
---
## Practical Examples
### Job Application
```json
{
"full_name": "John Doe",
"email": "john.doe@example.com",
"phone": "555-123-4567",
"address": "123 Main St, Anytown, CA 90210",
"position": "Software Engineer",
"start_date": "2024-02-01",
"salary_expectation": "120000",
"years_experience": "5",
"education": "BS Computer Science, State University",
"references": "Jane Smith (jane@example.com), Bob Johnson (bob@example.com)"
}
```
```bash
datalab fill job-application.pdf --fields application-data.json --output filled-application.pdf
```
### Tax Form
```json
{
"taxpayer_name": "John Doe",
"ssn": "123-45-6789",
"address": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "90210",
"filing_status": "Single",
"wages": "85000",
"federal_tax_withheld": "12000"
}
```
```bash
datalab fill w2-form.pdf \
--fields tax-data.json \
--context "W-2 Wage and Tax Statement for 2024" \
--output filled-w2.pdf
```
### Medical Form
```json
{
"patient_name": "John Doe",
"date_of_birth": "1990-05-15",
"insurance_provider": "Blue Cross",
"policy_number": "BC123456789",
"primary_care_physician": "Dr. Smith",
"allergies": "Penicillin",
"current_medications": "None",
"emergency_contact": "Jane Doe",
"emergency_phone": "555-987-6543"
}
```
```bash
datalab fill patient-intake.pdf --fields medical-data.json --output filled-intake.pdf
```
---
## Page Selection
Fill forms on specific pages:
```bash
# Only first 3 pages
datalab fill multi-page-form.pdf --fields data.json --max-pages 3 --output filled.pdf
# Specific pages
datalab fill form.pdf --fields data.json --page-range "0-2,5" --output filled.pdf
```
---
## Workflow: Extract Then Fill
Extract data from one document and use it to fill another:
```bash
# 1. Extract data from source document
datalab extract source.pdf --schema schema.json > extracted-data.json
# 2. Transform data if needed (using jq)
email: .customer_email,
phone: .customer_phone
}' > form-data.json
# 3. Fill the target form
datalab fill target-form.pdf --fields form-data.json --output filled-form.pdf
```
---
## Output Format
### With --output Flag
The filled form is saved directly to the file:
```bash
datalab fill form.pdf --fields data.json --output filled.pdf
```
JSON response:
```json
{
"filled_fields": 8,
"unmatched_fields": ["unknown_field"],
"output_file": "filled.pdf"
}
```
### Without --output Flag
The response includes base64-encoded PDF:
```json
{
"filled_fields": 8,
"output_base64": "JVBERi0xLjQK..."
}
```
---
## Handling Unmatched Fields
The CLI reports fields that couldn't be matched:
```json
{
"filled_fields": 8,
"unmatched_fields": ["special_code", "internal_id"],
"output_file": "filled.pdf"
}
```
To resolve unmatched fields:
1. **Check field names**: Ensure they're similar to form labels
2. **Lower threshold**: Try `--confidence-threshold 0.3`
3. **Add context**: Provide hints about the form
4. **Rename fields**: Use names closer to form labels
---
## Batch Processing
Fill multiple forms with the same data:
```bash
#!/bin/bash
data="applicant-data.json"
for form in forms/*.pdf; do
output="filled/$(basename "$form")"
echo "Filling $form..."
datalab fill "$form" --fields "$data" --output "$output"
done
```
Fill multiple forms with different data:
```bash
#!/bin/bash
for data_file in data/*.json; do
base=$(basename "$data_file" .json)
echo "Processing $base..."
datalab fill "form.pdf" --fields "$data_file" --output "filled/${base}.pdf"
done
```
---
## Tips and Best Practices
### Use Descriptive Field Names
Good:
```json
{
"first_name": "John",
"last_name": "Doe",
"email_address": "john@example.com"
}
```
Less optimal:
```json
{
"f1": "John",
"f2": "Doe",
"f3": "john@example.com"
}
```
### Match Date Formats
Try to match the expected date format:
```json
{
"date": "01/15/2024",
"birth_date": "1990-05-15",
"start_date": "January 15, 2024"
}
```
### Handle Checkboxes
For checkbox fields, use boolean or "X":
```json
{
"agree_to_terms": true,
"opt_in_marketing": false,
"signature_checkbox": "X"
}
```
---
## Troubleshooting
### Fields Not Filling
1. Check confidence threshold:
```bash
datalab fill form.pdf --fields data.json --confidence-threshold 0.3 --output filled.pdf
```
2. Add context:
```bash
datalab fill form.pdf --fields data.json --context "Employment application form" --output filled.pdf
```
3. Rename fields to match form labels more closely
### Wrong Field Filled
Increase confidence threshold:
```bash
datalab fill form.pdf --fields data.json --confidence-threshold 0.8 --output filled.pdf
```
### Form Not Recognized
Ensure the PDF contains fillable fields or clear form structure. For image-based forms, the CLI will attempt OCR.
---
## Next Steps
- [Extract data from documents](extract-data.md)
- [Create documents from markdown](../commands/create-document.md)
- [fill command reference](../commands/fill.md)