jshape 0.1.0

Repair malformed JSON and render a stable, human-readable structural outline.
Documentation

jshape

jshape repairs malformed JSON input and renders a stable, human-readable structural outline.

It is designed for large or messy JSON payloads where you want to understand the shape of the data quickly instead of reading the full document.

What It Does

  • Repairs malformed JSON before parsing
  • Extracts nested object and array structure
  • Preserves example values for scalar fields
  • Marks optional object fields with ?
  • Preserves object key order from the input
  • Ships with a short CLI command: jshape

Install

As a library:

[dependencies]
jshape = "0.1.0"

As a CLI:

cargo install jshape

Publish

This repository ships with a GitHub Actions workflow for crates.io publishing.

  1. Add a repository secret named CARGO_REGISTRY_TOKEN
  2. Open the Publish Crate workflow in GitHub Actions
  3. Click Run workflow

The same workflow also publishes automatically when you push a version tag such as v0.1.0.

CLI Usage

Read from a file:

jshape payload.json

Read from stdin:

cat payload.json | jshape

Show types instead of example values:

jshape --no-examples payload.json

Library Usage

use jshape::analyze_json;

fn main() {
    let input = r#"{
        'user': {
            "name": "Ada",
            "roles": ["admin", "editor"]
        }
    }"#;

    let outline = analyze_json(input, true).unwrap();
    println!("{}", outline);
}

If you need lower-level access, the crate also exposes:

  • repair_and_parse_json
  • extract_schema
  • format_schema
  • Schema

Example

The painful case is usually not a tiny nested object. It is a giant export where one array contains thousands of items with almost the same structure, making the raw JSON long and hard to scan.

Below is a trimmed excerpt from a much larger analytics export. In the real file, the events array contains tens of thousands of similarly shaped records:

{
  "export_id": "exp_2026_03_27_001",
  "generated_at": "2026-03-27T03:14:15Z",
  "tenant_id": "tenant_42",
  "events": [
    {
      "event_id": "evt_000001",
      "session_id": "sess_a1",
      "user_id": "usr_1001",
      "event_type": "page_view",
      "source": "web",
      "timestamp": "2026-03-27T03:00:01Z",
      "request": {
        "method": "GET",
        "path": "/dashboard",
        "status": 200,
        "duration_ms": 42
      },
      "device": {
        "os": "macOS",
        "browser": "Chrome",
        "locale": "en-US"
      },
      "geo": {
        "country": "US",
        "region": "CA",
        "city": "San Francisco"
      },
      "metrics": {
        "cpu_ms": 12,
        "db_rows": 18,
        "cache_hit": true
      },
      "tags": ["prod", "dashboard", "page_view"]
    },
    {
      "event_id": "evt_000002",
      "session_id": "sess_a1",
      "user_id": "usr_1001",
      "event_type": "page_view",
      "source": "web",
      "timestamp": "2026-03-27T03:00:03Z",
      "request": {
        "method": "GET",
        "path": "/dashboard/usage",
        "status": 200,
        "duration_ms": 57
      },
      "device": {
        "os": "macOS",
        "browser": "Chrome",
        "locale": "en-US"
      },
      "geo": {
        "country": "US",
        "region": "CA",
        "city": "San Francisco"
      },
      "metrics": {
        "cpu_ms": 19,
        "db_rows": 44,
        "cache_hit": true
      },
      "tags": ["prod", "dashboard", "page_view"]
    },
    {
      "event_id": "evt_000003",
      "session_id": "sess_b9",
      "user_id": "usr_2048",
      "event_type": "api_call",
      "source": "api",
      "timestamp": "2026-03-27T03:00:04Z",
      "request": {
        "method": "POST",
        "path": "/v1/reports/query",
        "status": 200,
        "duration_ms": 183
      },
      "device": {
        "os": "Linux",
        "browser": "curl",
        "locale": "en-US"
      },
      "geo": {
        "country": "DE",
        "region": "BE",
        "city": "Berlin"
      },
      "metrics": {
        "cpu_ms": 98,
        "db_rows": 1200,
        "cache_hit": false
      },
      "tags": ["prod", "reports", "api"],
      "error": {
        "code": "RATE_LIMIT_NEAR",
        "retryable": true
      }
    },
    {
      "event_id": "evt_000004",
      "session_id": "sess_c2",
      "user_id": "usr_3099",
      "event_type": "page_view",
      "source": "web",
      "timestamp": "2026-03-27T03:00:05Z",
      "request": {
        "method": "GET",
        "path": "/billing/invoices",
        "status": 200,
        "duration_ms": 61
      },
      "device": {
        "os": "Windows",
        "browser": "Edge",
        "locale": "en-GB"
      },
      "geo": {
        "country": "GB",
        "region": "LND",
        "city": "London"
      },
      "metrics": {
        "cpu_ms": 21,
        "db_rows": 72,
        "cache_hit": true
      },
      "tags": ["prod", "billing", "page_view"]
    },
    ... thousands more records with the same overall shape ...
  ],
  "aggregates": {
    "event_count": 48762,
    "unique_users": 913,
    "time_range": {
      "from": "2026-03-27T00:00:00Z",
      "to": "2026-03-27T03:14:15Z"
    }
  }
}

Output after running jshape:

{
  "export_id": "exp_2026_03_27_001",
  "generated_at": "2026-03-27T03:14:15Z",
  "tenant_id": "tenant_42",
  "events": [
    {
      "event_id": "evt_000001",
      "session_id": "sess_a1",
      "user_id": "usr_1001",
      "event_type": "page_view", "api_call",
      "source": "web", "api",
      "timestamp": "2026-03-27T03:00:01Z",
      "request": {
        "method": "GET", "POST",
        "path": "/dashboard", "/dashboard/usage", "/v1/reports/query", "/billing/invoices",
        "status": 200,
        "duration_ms": 42, 57, 183, 61
      },
      "device": {
        "os": "macOS", "Linux", "Windows",
        "browser": "Chrome", "curl", "Edge",
        "locale": "en-US", "en-GB"
      },
      "geo": {
        "country": "US", "DE", "GB",
        "region": "CA", "BE", "LND",
        "city": "San Francisco", "Berlin", "London"
      },
      "metrics": {
        "cpu_ms": 12, 19, 98, 21,
        "db_rows": 18, 44, 1200, 72,
        "cache_hit": bool
      },
      "tags": [
        "prod", "dashboard", "page_view", "reports", "api", "billing"
      ],
      "error"?: {
        "code": "RATE_LIMIT_NEAR",
        "retryable": true
      }
    },
  ...  // 48762 items
  ],
  "aggregates": {
    "event_count": 48762,
    "unique_users": 913,
    "time_range": {
      "from": "2026-03-27T00:00:00Z",
      "to": "2026-03-27T03:14:15Z"
    }
  }
}

The difference is the point of the tool: the raw input repeats the same object shape thousands of times, while the output keeps one representative structure, marks optional fields, preserves a few concrete values, and tells you how large the array really is.

Notes

  • Output is JSON-like and stable for inspection, but it is not guaranteed to be valid JSON in every mode.
  • Optional fields are rendered with a trailing ?.
  • Large arrays are summarized instead of printing every element in full.
  • This crate currently relies on json-repair = 0.4.0, which requires a nightly-compatible build environment or RUSTC_BOOTSTRAP=1.