datalab-cli 0.1.0

A powerful CLI for converting, extracting, and processing documents using the Datalab API
Documentation
# Filling Forms

Learn how to fill PDF and image forms with data using the Datalab CLI.

---

## Prerequisites

- [Datalab CLI installed]../getting-started/installation.md
- [API key configured]../getting-started/configuration.md

---

## Basic Form Filling

Fill a form with inline field data:

```bash
datalab fill application.pdf \
  --fields '{"name": "John Doe", "email": "john@example.com"}' \
  --output filled.pdf
```

The CLI:
1. Analyzes the form to find fields
2. Matches your data to form fields
3. Fills matching fields
4. Returns the filled form

---

## Field Data Format

### Inline JSON

```bash
datalab fill form.pdf --fields '{"field_name": "value"}' --output filled.pdf
```

### From File

Create `data.json`:
```json
{
  "first_name": "John",
  "last_name": "Doe",
  "email": "john@example.com",
  "phone": "555-123-4567",
  "date": "2024-01-15"
}
```

Run:
```bash
datalab fill form.pdf --fields data.json --output filled.pdf
```

---

## Field Matching

The CLI uses fuzzy matching to map your field names to form fields.

### How It Works

Your field names don't need to match exactly:

| Your Field | Form Field | Match? |
|------------|------------|--------|
| `name` | `Full Name` | Yes |
| `email` | `Email Address` | Yes |
| `first_name` | `First Name` | Yes |
| `phone` | `Phone Number` | Yes |
| `dob` | `Date of Birth` | Yes |

### Confidence Threshold

Control matching strictness with `--confidence-threshold`:

```bash
# Lenient matching (more fields matched, possible mismatches)
datalab fill form.pdf --fields data.json --confidence-threshold 0.3 --output filled.pdf

# Strict matching (fewer matches, higher accuracy)
datalab fill form.pdf --fields data.json --confidence-threshold 0.8 --output filled.pdf
```

| Threshold | Behavior |
|-----------|----------|
| 0.0 - 0.3 | Very lenient, matches loosely similar names |
| 0.4 - 0.6 | Balanced (default: 0.5) |
| 0.7 - 1.0 | Strict, requires close matches |

---

## Adding Context

Help the CLI understand your form with context:

```bash
datalab fill tax-form.pdf \
  --fields '{"name": "John Doe", "ssn": "123-45-6789"}' \
  --context "This is a W-2 tax form for 2024" \
  --output filled.pdf
```

Good context includes:
- Form type or purpose
- Year or time period
- Organization name
- Any disambiguation hints

---

## Practical Examples

### Job Application

```json
{
  "full_name": "John Doe",
  "email": "john.doe@example.com",
  "phone": "555-123-4567",
  "address": "123 Main St, Anytown, CA 90210",
  "position": "Software Engineer",
  "start_date": "2024-02-01",
  "salary_expectation": "120000",
  "years_experience": "5",
  "education": "BS Computer Science, State University",
  "references": "Jane Smith (jane@example.com), Bob Johnson (bob@example.com)"
}
```

```bash
datalab fill job-application.pdf --fields application-data.json --output filled-application.pdf
```

### Tax Form

```json
{
  "taxpayer_name": "John Doe",
  "ssn": "123-45-6789",
  "address": "123 Main St",
  "city": "Anytown",
  "state": "CA",
  "zip": "90210",
  "filing_status": "Single",
  "wages": "85000",
  "federal_tax_withheld": "12000"
}
```

```bash
datalab fill w2-form.pdf \
  --fields tax-data.json \
  --context "W-2 Wage and Tax Statement for 2024" \
  --output filled-w2.pdf
```

### Medical Form

```json
{
  "patient_name": "John Doe",
  "date_of_birth": "1990-05-15",
  "insurance_provider": "Blue Cross",
  "policy_number": "BC123456789",
  "primary_care_physician": "Dr. Smith",
  "allergies": "Penicillin",
  "current_medications": "None",
  "emergency_contact": "Jane Doe",
  "emergency_phone": "555-987-6543"
}
```

```bash
datalab fill patient-intake.pdf --fields medical-data.json --output filled-intake.pdf
```

---

## Page Selection

Fill forms on specific pages:

```bash
# Only first 3 pages
datalab fill multi-page-form.pdf --fields data.json --max-pages 3 --output filled.pdf

# Specific pages
datalab fill form.pdf --fields data.json --page-range "0-2,5" --output filled.pdf
```

---

## Workflow: Extract Then Fill

Extract data from one document and use it to fill another:

```bash
# 1. Extract data from source document
datalab extract source.pdf --schema schema.json > extracted-data.json

# 2. Transform data if needed (using jq)
cat extracted-data.json | jq '{
  name: .customer_name,
  email: .customer_email,
  phone: .customer_phone
}' > form-data.json

# 3. Fill the target form
datalab fill target-form.pdf --fields form-data.json --output filled-form.pdf
```

---

## Output Format

### With --output Flag

The filled form is saved directly to the file:

```bash
datalab fill form.pdf --fields data.json --output filled.pdf
```

JSON response:
```json
{
  "filled_fields": 8,
  "unmatched_fields": ["unknown_field"],
  "output_file": "filled.pdf"
}
```

### Without --output Flag

The response includes base64-encoded PDF:

```json
{
  "filled_fields": 8,
  "output_base64": "JVBERi0xLjQK..."
}
```

---

## Handling Unmatched Fields

The CLI reports fields that couldn't be matched:

```json
{
  "filled_fields": 8,
  "unmatched_fields": ["special_code", "internal_id"],
  "output_file": "filled.pdf"
}
```

To resolve unmatched fields:

1. **Check field names**: Ensure they're similar to form labels
2. **Lower threshold**: Try `--confidence-threshold 0.3`
3. **Add context**: Provide hints about the form
4. **Rename fields**: Use names closer to form labels

---

## Batch Processing

Fill multiple forms with the same data:

```bash
#!/bin/bash
data="applicant-data.json"

for form in forms/*.pdf; do
    output="filled/$(basename "$form")"
    echo "Filling $form..."
    datalab fill "$form" --fields "$data" --output "$output"
done
```

Fill multiple forms with different data:

```bash
#!/bin/bash
for data_file in data/*.json; do
    base=$(basename "$data_file" .json)
    echo "Processing $base..."
    datalab fill "form.pdf" --fields "$data_file" --output "filled/${base}.pdf"
done
```

---

## Tips and Best Practices

### Use Descriptive Field Names

Good:
```json
{
  "first_name": "John",
  "last_name": "Doe",
  "email_address": "john@example.com"
}
```

Less optimal:
```json
{
  "f1": "John",
  "f2": "Doe",
  "f3": "john@example.com"
}
```

### Match Date Formats

Try to match the expected date format:

```json
{
  "date": "01/15/2024",
  "birth_date": "1990-05-15",
  "start_date": "January 15, 2024"
}
```

### Handle Checkboxes

For checkbox fields, use boolean or "X":

```json
{
  "agree_to_terms": true,
  "opt_in_marketing": false,
  "signature_checkbox": "X"
}
```

---

## Troubleshooting

### Fields Not Filling

1. Check confidence threshold:
   ```bash
   datalab fill form.pdf --fields data.json --confidence-threshold 0.3 --output filled.pdf
   ```

2. Add context:
   ```bash
   datalab fill form.pdf --fields data.json --context "Employment application form" --output filled.pdf
   ```

3. Rename fields to match form labels more closely

### Wrong Field Filled

Increase confidence threshold:
```bash
datalab fill form.pdf --fields data.json --confidence-threshold 0.8 --output filled.pdf
```

### Form Not Recognized

Ensure the PDF contains fillable fields or clear form structure. For image-based forms, the CLI will attempt OCR.

---

## Next Steps

- [Extract data from documents]extract-data.md
- [Create documents from markdown]../commands/create-document.md
- [fill command reference]../commands/fill.md