headson
Head/tail for JSON — but structure‑aware. Get a compact preview that shows both the shape and representative values of your data, all within a strict character budget.
Available as:
- CLI (see Usage)
- Python library (see Python Bindings)
Install
Using Cargo:
cargo install headson
From source:
cargo build --release
target/release/headson --help
Features
- Budgeted output: specify exactly how much JSON you want to see
- Multiple output formats :
json(machine‑readable),pseudo(human‑friendly),js(valid JavaScript, most detailed metadata). - Multiple inputs: preview many files at once with a shared or per‑file budget.
- Fast: can process gigabyte-scale files in seconds (mostly disk-constrained)
- Available as a CLI app and as a Python library
Fits into command line workflows
If you’re comfortable with tools like head and tail, use headson when you want a quick, structured peek into a JSON file without dumping the entire thing.
head/tailoperate on bytes/lines - their output is not optimized for tree structuresjqyou need to craft filters to preview large JSON filesheadsonis like head/tail for trees: zero config but it keeps structure and represents content as much as possible
Usage
headson [FLAGS] [INPUT...]
- INPUT (optional, repeatable): file path(s). If omitted, reads JSON from stdin. Multiple input files are supported.
- Prints the preview to stdout. On parse errors, exits non‑zero and prints an error to stderr.
Common flags:
-n, --budget <BYTES>: per‑file output budget. When multiple input files are provided, the total budget equals<BYTES> * number_of_inputs.-N, --global-budget <BYTES>: total output budget across all inputs. Useful when you want a fixed-size preview across many files (may omit entire files). Mutually exclusive with--budget.-f, --template <json|pseudo|js>: output style (default:pseudo)-m, --compact: no indentation, no spaces, no newlines--no-newline: single line output--no-space: no space after:in objects--indent <STR>: indentation unit (default: two spaces)--string-cap <N>: max graphemes to consider per string (default: 500)--head: prefer the beginning of arrays when truncating (keep first N). Strings are unaffected. Inpseudo/jstemplates the omission marker appears near the end;jsonremains strict. Mutually exclusive with--tail.--tail: prefer the end of arrays when truncating (keep last N). Strings are unaffected. Inpseudo/jstemplates the omission marker appears at the start;jsonremains strict. Mutually exclusive with--head.
Notes:
- With multiple input files:
- JSON template outputs a single JSON object keyed by the input file paths.
- Pseudo and JS templates render file sections with human-readable headers when newlines are enabled.
- If you use
--compactor--no-newline(both disable newlines), fileset output falls back to standard inline rendering (no per-file headers) to remain compact.
- If you use
- Using
--global-budgetmay truncate or omit entire files to respect the total budget. - The tool finds the largest preview that fits the budget; if even the tiniest preview exceeds it, you still get a minimal, valid preview.
- When passing file paths, directories and binary files are ignored; a notice is printed to stderr for each (e.g.,
Ignored binary file: ./path/to/file). Stdin mode reads the stream as-is. - Head vs Tail sampling: these options bias which part of arrays are kept before rendering. They guarantee the kept segment is contiguous at the chosen side (prefix for
--head, suffix for--tail). Display templates may still insert additional internal gap markers inside that kept segment to honor very small budgets;jsonremains strict and unannotated.
Quick one‑liners:
-
Peek a big JSON stream (keeps structure):
zstdcat huge.json.zst | headson -n 800 -f pseudo -
Many files with a fixed overall size:
headson -N 1200 -f json logs/*.json -
Glance at a file, JavaScript‑style comments for omissions:
headson -n 400 -f js data.json
Show help:
headson --help
Examples: head vs headson
Input:
Naive cut (can break mid‑token):
|
# {"users":[{"id":1,"name":"Ana","roles":["admin","dev"]},{"id":2,"name":"Bo"}],"me
Structured preview with headson (pseudo):
# {
# users: [
# { id: 1, name: "Ana", roles: [ "admin", … ] },
# …
# ]
# meta: { count: 2, … }
# }
Machine‑readable preview (json):
# {"users":[{"id":1,"name":"Ana","roles":["admin"]}],"meta":{"count":2}}
Python Bindings
A thin Python extension module is available on PyPI as headson.
- Install:
pip install headson(ABI3 wheels for Python 3.10+ on Linux/macOS/Windows). - API:
headson.summarize(text: str, *, template: str = "pseudo", character_budget: int | None = None, skew: str = "balanced") -> strtemplate: one of"json" | "pseudo" | "js"character_budget: maximum output size in characters (default: 500)skew: one of"balanced" | "head" | "tail"(focus arrays on start vs end; only affects display templates;jsonremains strict).
Example:
=
=
# Prefer the tail of arrays (annotations show in pseudo/js only)
Algorithm
%%{init: {"themeCSS": ".cluster > rect { fill: transparent; stroke: transparent; } .clusterLabel > text { font-size: 16px; font-weight: 600; } .clusterLabel span { padding: 6px 10px; font-size: 16px; font-weight: 600; }"}}%%
flowchart TD
subgraph Deserialization
direction TB
A["Input file(s)"]
A -- Single --> C["Parse into optimized tree (with array pre‑sampling) ¹"]
A -- Multiple --> D["Parse each file and wrap into a fileset object"]
D --> C
end
subgraph Prioritization
direction TB
E["Build priority order ²"]
F["Choose top N nodes ³"]
end
subgraph Serialization
direction TB
G["Render attempt ⁴"]
H["Output preview string"]
end
C --> E
E --> F
F --> G
G --> F
F --> H
%% Color classes for categories
classDef des fill:#eaf2ff,stroke:#3b82f6,stroke-width:1px,color:#0f172a;
classDef prio fill:#ecfdf5,stroke:#10b981,stroke-width:1px,color:#064e3b;
classDef ser fill:#fff1f2,stroke:#f43f5e,stroke-width:1px,color:#7f1d1d;
class A,C,D des;
class E,F prio;
class G,H ser;
style Deserialization fill:transparent,stroke:transparent
style Prioritization fill:transparent,stroke:transparent
style Serialization fill:transparent,stroke:transparent
Footnotes
- [1] Optimized tree representation: An arena‑style tree stored in flat, contiguous buffers. Each node records its kind and value plus index ranges into shared child and key arrays. Arrays are ingested in a single pass and may be deterministically pre‑sampled: the first element is always kept; additional elements are selected via a fixed per‑index inclusion test; for kept elements, original indices are stored and full lengths are counted. This enables accurate omission info and internal gap markers later, while minimizing pointer chasing.
- [2] Priority order: Nodes are scored so previews surface representative structure and values first. Arrays can favor head/mid/tail coverage (default) or strictly the head; tail preference flips head/tail when configured. Object properties are ordered by key, and strings expand by grapheme with early characters prioritized over very deep expansions.
- [3] Choose top N nodes (binary search): Iteratively picks N so that the rendered preview fits within the character budget, looping between “choose N” and a render attempt to converge quickly.
- [4] Render attempt: Serializes the currently included nodes using the selected template. Omission summaries and per-file section headers appear in display templates (pseudo/js); json remains strict. For arrays, display templates may insert internal gap markers between non‑contiguous kept items using original indices.
License
MIT