# `prototext`
Lossless, bidirectional converter between binary protobuf wire format and
human-readable text. Three promises:
1. **Lossless round-trip** — `binary → text → binary` is byte-for-byte
identical for any input: well-formed, non-canonical, or malformed.
2. **protoc-compatible** — for canonical protobuf messages the text output is
identical to `protoc --decode`.
3. **Schema-aware** — supply a compiled `.pb` descriptor and a root message
type to get field names, proto types, and enum values. Works without a
schema too: every field is decoded by wire type and field number.
## Installation
### NixOS / nix-shell
Build and install from the repo:
```shell
git clone https://github.com/douzebis/prototools
cd prototools
nix-build # result/bin/prototext
```
Man page and shell completions are installed automatically.
Or enter the user shell with `prototext` on `PATH`, completions and man
page activated for the current session:
```shell
nix-shell # enters shell.nix → user-shell
```
For working on prototools itself (adds pyright, pytest, source-tree wiring):
```shell
nix-shell dev-shell.nix
```
### cargo install
```shell
cargo install --locked prototext
```
`--locked` is required: it uses the dependency versions that were tested at
release time.
> **Man pages:** `cargo install` does not install man pages. To generate
> them locally, run:
>
> ```shell
> cargo install --locked --bin prototext-gen-man prototext
> prototext-gen-man ~/.local/share/man/man1
> ```
## Shell Completions
When installed via `nix-build`, completion scripts are installed automatically.
When using `nix-shell`, bash completions are activated for the current session.
For a `cargo install` setup, add one line to your shell's startup file:
```bash
# bash (~/.bashrc)
source <(PROTOTEXT_COMPLETE=bash prototext | sed \
-e '/^\s*) )$/a\ compopt -o filenames 2>/dev/null' \
-e 's|words\[COMP_CWORD\]="$2"|local _cur="${COMP_LINE:0:${COMP_POINT}}"; _cur="${_cur##* }"; words[COMP_CWORD]="${_cur}"|')
```
```zsh
# zsh (~/.zshrc)
source <(PROTOTEXT_COMPLETE=zsh prototext)
```
```fish
# fish (~/.config/fish/config.fish)
PROTOTEXT_COMPLETE=fish prototext | source
```
## Quick start
`google.protobuf.*` types are embedded — no descriptor file needed. Decode
any `.pb` descriptor that `protoc` produces:
```
$ protoc -o timestamp.pb google/protobuf/timestamp.proto
$ prototext decode -a --type google.protobuf.FileDescriptorSet timestamp.pb
#@ prototext: protoc
file { #@ repeated FileDescriptorProto = 1
name: "google/protobuf/timestamp.proto" #@ string = 1
package: "google.protobuf" #@ string = 2
message_type { #@ repeated DescriptorProto = 4
name: "Timestamp" #@ string = 1
field { #@ repeated FieldDescriptorProto = 2
name: "seconds" #@ string = 1
number: 1 #@ int32 = 3
label: LABEL_OPTIONAL #@ Label(1) = 4
type: TYPE_INT64 #@ Type(3) = 5
json_name: "seconds" #@ string = 10
}
...
}
}
```
Pass `-a` / `--annotations` to include inline wire-type comments; omit it for
clean output compatible with `protoc --decode`.
Encode back to binary and verify the round-trip is byte-exact (annotations
are required for lossless encode):
```
byte-exact
```
**Non-canonical encoding** — protobuf varints can carry redundant continuation
bytes and still decode to the same value. Standard tools discard these bytes;
`prototext` preserves them via inline annotations (enable with `-a`):
```
$ printf '\x08\xaa\x00' | prototext decode -a
#@ prototext: protoc
1: 42 #@ varint; val_ohb: 1
```
Field 1 = 42, but encoded in three bytes instead of the canonical two
(`val_ohb: 1` records the one redundant byte). The round-trip is still
byte-exact:
```
$ printf '\x08\xaa\x00' | prototext decode -a | prototext encode | od -A n -t x1
08 aa 00
```
**Canonicality verification** — annotation modifiers flag every
wire-level deviation from the canonical encoding: overlong varints
(`tag_ohb`, `val_ohb`, `len_ohb`, `packed_ohb`), non-canonical NaN bit
patterns (`nan_bits`), truncated payloads (`MISSING`), mismatched or
open groups (`END_MISMATCH`, `OPEN_GROUP`), and out-of-range field
numbers (`TAG_OOR`). Repeated optional fields and interleaved fields
are visible as duplicate or out-of-order field names in the text output.
Together these make `prototext decode -a` a practical tool for auditing
whether a binary message conforms to the canonical encoding rules.
## Schema inference
When a descriptor DB is available, `prototext` can infer the message type
automatically:
```
$ prototext --descriptor-set my.desc list-schemas unknown.pb
- path: unknown.pb
types:
- google.type.PostalAddress
$ prototext --descriptor-set my.desc decode unknown.pb
#@ prototext: protoc
# Type: google.type.PostalAddress
# Score: 13 (matched: 13, unknown: 0, mismatches: 0, non_canonical: 0)
revision: 448
region_code: "US"
...
```
Build a descriptor DB with `reproto --build-schema-db`, or use the
pre-built `googleapis-db` Nix derivation (~8 000 types):
```shell
export GOOGLEAPIS_DB=$(nix-build -A googleapis-db --no-out-link)/googleapis.desc
prototext --descriptor-set $GOOGLEAPIS_DB list-schemas unknown.pb
```
For full usage see `man prototext` or the
[online docs](https://douzebis.github.io/prototools).