libpgs 0.6.0

Fast PGS subtitle extraction, encoding, and round-trip transformation for MKV and M2TS containers
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
# libpgs NDJSON Streaming & Encoding Reference

## Overview

The `libpgs stream` command extracts PGS (Presentation Graphic Stream) subtitles from MKV and M2TS containers and outputs structured data as newline-delimited JSON (NDJSON) to stdout. The `libpgs encode` command reads the same format from stdin and writes `.sup` files. Together they enable full round-trip workflows: extract, transform with any language, and write back.

Each NDJSON line is a self-contained JSON object. This enables any language to consume and produce PGS data via subprocess pipes — no temp files, no waiting for full extraction, no PGS format knowledge required.

## Usage

```bash
libpgs stream <file>                                 # All tracks
libpgs stream <file> -t 3                            # Single track
libpgs stream <file> -t 3 -t 5                       # Multiple tracks
libpgs stream <file> --raw-payloads                  # Include base64 raw segment bytes
libpgs stream <file> --start 0:05:00                 # From 5 minutes to end of file
libpgs stream <file> --start 0:05:00 --end 0:10:00   # 5-minute window only
libpgs stream <file> --with-header                   # Prepend manifest header (.sup only)
```

Timestamps accept `HH:MM:SS.ms`, `MM:SS.ms`, `SS.ms`, or plain seconds (e.g., `300`). When `--start` or `--end` is specified, libpgs seeks directly to the estimated byte offset — data before the start point is not read. If no display sets fall within the range, the stream outputs the tracks header followed by EOF (no error).

Output is flushed after every line. Closing the pipe (e.g., `head -n 10`) causes a clean exit.

## Protocol

The output consists of up to three types of JSON lines:

1. **`header`** — optional, emitted only for `.sup` inputs when `--with-header` is passed. When present, it is the very first line and carries total display-set counts so consumers can show a progress denominator immediately.
2. **`tracks`** — always emitted (first line for containers, second line for `.sup`).
3. **`display_set`** — one per subtitle event, for the remainder of the stream.

Check the `"type"` field to distinguish them.

---

## Manifest Header (`.sup` only, opt-in)

When `--with-header` is passed on a `.sup` input, libpgs runs a pre-scan of the file and prepends a single header line with total display-set counts. The flag is opt-in because the pre-scan adds an upfront latency before the first `display_set` line is emitted; consumers that don't need a progress denominator should omit the flag. Containers (MKV, M2TS) ignore the flag — counting there would require a full demux, and MKV already surfaces per-track `display_set_count` via the `tracks` line when Tags are present.

```json
{
  "type": "header",
  "total_display_sets": 1823,
  "total_content_display_sets": 1456,
  "total_clear_display_sets": 367
}
```

| Field | Type | Description |
|-------|------|-------------|
| `total_display_sets` | number | All display sets (count of END segments). |
| `total_content_display_sets` | number | PCSes with at least one composition object — visible subtitle frames. |
| `total_clear_display_sets` | number | PCSes with zero composition objects — "remove from screen" display sets. |

`total_content_display_sets + total_clear_display_sets == total_display_sets`.

The pre-scan reads only 13-byte segment headers (and tiny PCS payloads) while seeking over other payloads — ~1–2% of file bytes, completing in well under a second on multi-GB files.

---

## Track Discovery

The first line (or the second, after the header on `.sup` inputs) describes all PGS tracks found in the container.

```json
{
  "type": "tracks",
  "tracks": [
    {
      "track_id": 3,
      "language": "en",
      "container": "Matroska",
      "name": "English Subtitles",
      "is_default": true,
      "is_forced": false,
      "display_set_count": 1234,
      "indexed": true
    }
  ]
}
```

### Track fields

| Field | Type | Description |
|-------|------|-------------|
| `track_id` | `number` | Unique track identifier within the container |
| `language` | `string \| null` | BCP 47 language code (e.g., `"en"`, `"ja"`). Uses ISO 639-1 (2-letter) where available, ISO 639-2/T (3-letter) otherwise. |
| `container` | `string` | Source format: `"Matroska"`, `"M2TS"`, `"TransportStream"`, or `"SUP"` |
| `name` | `string \| null` | Track name from container metadata (MKV TrackName). `null` for M2TS. |
| `is_default` | `boolean \| null` | Whether this track is flagged as default. `null` for M2TS. |
| `is_forced` | `boolean \| null` | Whether this track is flagged as forced. `null` for M2TS. |
| `display_set_count` | `number \| null` | Expected number of display sets (from MKV Tags). `null` if unknown. |
| `indexed` | `boolean \| null` | Whether the container has a seek index for this track, enabling fast random access. `null` for M2TS. |

---

## Display Sets

Each subsequent line represents one display set — a complete subtitle composition event.

### PGS background

A PGS display set defines a single screen update. It contains:
- A **composition** that describes what to show and where (screen dimensions, object placements)
- **Windows** — rectangular screen regions where objects are drawn
- **Palettes** — color lookup tables (YCrCbA format, up to 256 entries)
- **Objects** — RLE-compressed bitmap images

Display sets appear in three states:
- **`epoch_start`** — A completely new display. Contains everything needed to render from scratch.
- **`acquisition_point`** — A refresh point. Contains full replacement data for all objects. Used for mid-stream joining (e.g., seeking into a video).
- **`normal`** — An incremental update. Only contains what changed since the last composition. Commonly used to clear the screen (0 composition objects).

### Full example

```json
{
  "type": "display_set",
  "track_id": 3,
  "index": 42,
  "pts": 92863980,
  "pts_ms": 1031822.0,
  "composition": {
    "number": 430,
    "state": "epoch_start",
    "video_width": 1920,
    "video_height": 1080,
    "palette_only": false,
    "palette_id": 0,
    "objects": [
      {
        "object_id": 0,
        "window_id": 0,
        "x": 773,
        "y": 108,
        "crop": null
      },
      {
        "object_id": 1,
        "window_id": 1,
        "x": 739,
        "y": 928,
        "crop": null
      }
    ]
  },
  "windows": [
    { "id": 0, "x": 773, "y": 108, "width": 377, "height": 43 },
    { "id": 1, "x": 739, "y": 928, "width": 472, "height": 43 }
  ],
  "palettes": [
    {
      "id": 0,
      "version": 0,
      "entries": [
        { "id": 0, "luminance": 16, "cr": 128, "cb": 128, "alpha": 0 },
        { "id": 1, "luminance": 235, "cr": 128, "cb": 128, "alpha": 255 },
        { "id": 2, "luminance": 16, "cr": 128, "cb": 128, "alpha": 255 }
      ]
    }
  ],
  "objects": [
    {
      "id": 0,
      "version": 0,
      "sequence": "complete",
      "data_length": 8635,
      "width": 377,
      "height": 43,
      "bitmap": "<base64 palette indices, 377*43 = 16211 bytes>"
    },
    {
      "id": 1,
      "version": 0,
      "sequence": "complete",
      "data_length": 5210,
      "width": 472,
      "height": 43,
      "bitmap": "<base64 palette indices, 472*43 = 20296 bytes>"
    }
  ]
}
```

---

### Display set fields

| Field | Type | Description |
|-------|------|-------------|
| `type` | `string` | Always `"display_set"` |
| `track_id` | `number` | Matches a `track_id` from the tracks header |
| `index` | `number` | 0-based sequence number, counted per track |
| `pts` | `number` | Presentation timestamp in 90 kHz ticks |
| `pts_ms` | `number` | Presentation timestamp in milliseconds (`pts / 90`) |
| `composition` | `object \| null` | Composition data (from PCS segment). `null` if payload was malformed. |
| `windows` | `array` | Window definitions (from WDS segments). Empty array if none present. |
| `palettes` | `array` | Palette definitions (from PDS segments). Empty array if none present. |
| `objects` | `array` | Object definitions (from ODS segments). Empty array if none present. |

---

### Composition object

The `composition` field contains the presentation composition — the "control plane" of the display set.

| Field | Type | Description |
|-------|------|-------------|
| `number` | `number` | Composition number, incremented per graphics update |
| `state` | `string` | `"epoch_start"`, `"acquisition_point"`, or `"normal"` |
| `video_width` | `number` | Video frame width in pixels (e.g., 1920) |
| `video_height` | `number` | Video frame height in pixels (e.g., 1080) |
| `palette_only` | `boolean` | If `true`, this update only changes the palette — no new objects or positions |
| `palette_id` | `number` | ID of the palette used for this composition |
| `objects` | `array` | Placement instructions — where to draw each object on screen |

#### Composition object placements

Each entry in `composition.objects` is a placement instruction: "draw object X in window Y at position (x, y)."

| Field | Type | Description |
|-------|------|-------------|
| `object_id` | `number` | References an object in the top-level `objects` array by `id` |
| `window_id` | `number` | References a window in the `windows` array by `id` |
| `x` | `number` | Horizontal pixel offset from the top-left corner of the screen |
| `y` | `number` | Vertical pixel offset from the top-left corner of the screen |
| `crop` | `object \| null` | Cropping rectangle, or `null` if not cropped |

#### Crop object (when present)

| Field | Type | Description |
|-------|------|-------------|
| `x` | `number` | Horizontal crop offset within the object |
| `y` | `number` | Vertical crop offset within the object |
| `width` | `number` | Crop width in pixels |
| `height` | `number` | Crop height in pixels |

Cropping is used for progressive subtitle reveal (e.g., showing a few words first, then the rest).

---

### Window definitions

Each entry in `windows` defines a rectangular screen region where objects are drawn.

| Field | Type | Description |
|-------|------|-------------|
| `id` | `number` | Window ID (referenced by `composition.objects[].window_id`) |
| `x` | `number` | Horizontal pixel offset from top-left of screen |
| `y` | `number` | Vertical pixel offset from top-left of screen |
| `width` | `number` | Window width in pixels |
| `height` | `number` | Window height in pixels |

---

### Palette definitions

Each entry in `palettes` defines a color lookup table. Object bitmaps reference palette entries by ID to determine pixel color.

| Field | Type | Description |
|-------|------|-------------|
| `id` | `number` | Palette ID (referenced by `composition.palette_id`) |
| `version` | `number` | Palette version within the current epoch |
| `entries` | `array` | Color entries (up to 256) |

#### Palette entry

Colors are in YCrCb color space with alpha transparency.

| Field | Type | Description |
|-------|------|-------------|
| `id` | `number` | Entry index (0-255). Object bitmap pixels reference this ID. |
| `luminance` | `number` | Luminance / Y component (0-255) |
| `cr` | `number` | Chrominance red (0-255) |
| `cb` | `number` | Chrominance blue (0-255) |
| `alpha` | `number` | Transparency (0 = fully transparent, 255 = fully opaque) |

**Color conversion (YCrCb to RGB):**
```
R = luminance + 1.402 * (cr - 128)
G = luminance - 0.344136 * (cb - 128) - 0.714136 * (cr - 128)
B = luminance + 1.772 * (cb - 128)
```

---

### Object definitions

Each entry in `objects` defines a subtitle image. The RLE-compressed bitmap data is automatically decoded into a flat buffer of palette indices.

| Field | Type | Description |
|-------|------|-------------|
| `id` | `number` | Object ID (referenced by `composition.objects[].object_id`) |
| `version` | `number` | Object version within the current epoch |
| `sequence` | `string` | `"complete"`, `"reassembled"`, `"first"`, `"last"`, or `"continuation"` |
| `data_length` | `number` | Total object data length in bytes (includes 4 bytes for width+height) |
| `width` | `number` | Image width in pixels |
| `height` | `number` | Image height in pixels |
| `bitmap` | `string \| null` | Base64-encoded palette indices (1 byte per pixel, row-major). `null` if decoding failed. |

#### Bitmap format

The `bitmap` field contains the decoded subtitle image as a base64-encoded buffer of palette entry indices. Each byte is an index (0–255) into the `palettes[].entries[]` array. Pixels are stored in row-major order (left to right, top to bottom). The decoded buffer is exactly `width * height` bytes.

To render the image, look up each pixel's palette entry to get its YCrCb color and alpha value. libpgs does not perform color conversion — consumers choose their own color space handling.

#### Object fragmentation

Large objects in the PGS format may be split across multiple ODS segments. libpgs automatically reassembles fragments within each display set and decodes the combined bitmap. Reassembled objects have `"sequence": "reassembled"` to distinguish them from single-segment `"complete"` objects.

| Value | Meaning |
|-------|---------|
| `"complete"` | Single-segment object (most common) |
| `"reassembled"` | Multiple fragments were combined into one object |

With `--raw-payloads`, the `payload` field of a reassembled object contains the concatenated raw payloads of all fragments.

---

## Cross-references

The data model uses ID-based cross-references between sections:

```
composition.objects[].object_id  -->  objects[].id
composition.objects[].window_id  -->  windows[].id
composition.palette_id           -->  palettes[].id
```

A composition object placement says: "draw the bitmap from `objects[id=X]` using colors from `palettes[id=Y]` inside the screen region `windows[id=Z]` at pixel position (x, y)."

---

## Raw payloads (`--raw-payloads`)

By default, only structured data is output. Pass `--raw-payloads` to include the raw PGS segment bytes as base64-encoded strings.

When enabled, each item gains a `"payload"` field:

```json
{
  "composition": { "...": "...", "payload": "<base64>" },
  "windows": [{ "...": "...", "payload": "<base64>" }],
  "palettes": [{ "...": "...", "payload": "<base64>" }],
  "objects": [{ "...": "...", "payload": "<base64>" }]
}
```

The `payload` contains the raw segment payload bytes (after the PGS header). For ODS objects, this includes the RLE-compressed bitmap data. Use this if you need to:
- Write `.sup` files
- Decode RLE bitmaps yourself
- Pass raw data to another PGS-aware tool

If a segment's structured data could not be parsed (malformed payload), the semantic fields will be `null` but the raw `payload` is still included.

---

## Common patterns

### Get subtitle timing

```bash
libpgs stream movie.mkv | jq -r 'select(.type == "display_set") | "\(.pts_ms)ms track=\(.track_id) state=\(.composition.state)"'
```

### Get object positions and sizes

```bash
libpgs stream movie.mkv | jq 'select(.type == "display_set") | .composition.objects[] | {object_id, x, y, window_id}'
```

### Count display sets per track

```bash
libpgs stream movie.mkv | jq -s '[.[] | select(.type == "display_set")] | group_by(.track_id) | map({track: .[0].track_id, count: length})'
```

### Filter epoch starts only

```bash
libpgs stream movie.mkv | jq 'select(.type == "display_set" and .composition.state == "epoch_start")'
```

### Stream a specific time range

```bash
# Get subtitles between 1:30:00 and 1:35:00
libpgs stream movie.mkv --start 1:30:00 --end 1:35:00

# Pipe a 5-minute window to a Python consumer
libpgs stream movie.mkv -t 3 --start 0:05:00 --end 0:10:00 | python process.py
```

### Extract palette colors as RGB

```bash
libpgs stream movie.mkv | jq 'select(.type == "display_set") | .palettes[].entries[] | select(.alpha > 0)'
```

### Render bitmap to image (Python)

```python
import json, base64, sys
from PIL import Image

for line in sys.stdin:
    msg = json.loads(line)
    if msg["type"] != "display_set":
        continue
    palette = msg["palettes"][0]["entries"] if msg["palettes"] else []
    for obj in msg["objects"]:
        if not obj.get("bitmap"):
            continue
        w, h = obj["width"], obj["height"]
        indices = base64.b64decode(obj["bitmap"])
        img = Image.new("RGBA", (w, h))
        for i, idx in enumerate(indices):
            entry = palette[idx] if idx < len(palette) else {"luminance": 0, "cr": 128, "cb": 128, "alpha": 0}
            y_val, cr, cb, a = entry["luminance"], entry["cr"], entry["cb"], entry["alpha"]
            r = max(0, min(255, int(y_val + 1.402 * (cr - 128))))
            g = max(0, min(255, int(y_val - 0.344136 * (cb - 128) - 0.714136 * (cr - 128))))
            b = max(0, min(255, int(y_val + 1.772 * (cb - 128))))
            img.putpixel((i % w, i // w), (r, g, b, a))
        img.save(f"subtitle_{obj['id']}.png")
        break  # first object only
    break  # first display set only
```

---

## Encoding (NDJSON → .sup)

The `libpgs encode` command reads the same NDJSON format that `stream` produces and writes a `.sup` file. This closes the round-trip loop — extract, transform with any language, and write back:

```bash
libpgs stream movie.mkv | python modify.py | libpgs encode -o modified.sup
```

### Usage

```bash
libpgs encode -o <output.sup>       # Reads NDJSON from stdin
```

### Field handling

The encode command consumes `display_set` lines and ignores `tracks` lines (and blank lines). Each display set is rebuilt from its structured fields using `DisplaySetBuilder`, which handles RLE encoding and ODS fragmentation automatically.

| Field | Handling |
|-------|----------|
| `pts` | Primary timestamp source (90 kHz ticks). Used as-is. |
| `pts_ms` | Fallback: if `pts` is absent, computes `pts = round(pts_ms * 90)`. |
| `track_id` | Honored. Multiple track IDs produce separate output files. |
| `index` | Ignored. Display sets are written in input order. |
| `composition` | Required. Display sets with `null` composition are skipped with a stderr warning. |
| `composition.state` | Required. Must be `"epoch_start"`, `"acquisition_point"`, or `"normal"`. |
| `composition.objects[]` | Honored, including optional `crop` fields. |
| `windows` | Optional. Passed through to WDS segments when present. |
| `palettes` | Optional. All entries honored (id, luminance, cr, cb, alpha). |
| `objects` | Optional. The `bitmap` field (base64 palette indices) is re-encoded to RLE. |
| `objects[].bitmap` | Required per object. Base64-decoded, then RLE-encoded and fragmented as needed. |
| `data_length` | Ignored. Recomputed from the re-encoded bitmap. |
| `sequence` | Ignored. Recomputed based on re-encoded size and fragmentation. |

### Multi-track output

If all display sets share the same `track_id` (or none is specified), the output is written directly to the `-o` path. If multiple `track_id` values appear, encode splits the output into separate files:

```
output.sup          → output_track3.sup, output_track5.sup, ...
```

### Round-trip example (Python)

```python
import subprocess, json, base64, sys

# Stream from source
stream = subprocess.Popen(
    ["libpgs", "stream", "movie.mkv"],
    stdout=subprocess.PIPE, text=True
)

# Encode to output
encode = subprocess.Popen(
    ["libpgs", "encode", "-o", "modified.sup"],
    stdin=subprocess.PIPE, text=True
)

for line in stream.stdout:
    msg = json.loads(line)
    if msg["type"] == "display_set":
        # Example: brighten all palette entries
        for palette in msg.get("palettes", []):
            for entry in palette["entries"]:
                entry["luminance"] = min(255, entry["luminance"] + 20)
    encode.stdin.write(json.dumps(msg) + "\n")

encode.stdin.close()
encode.wait()
stream.wait()
```

### Error handling

Errors include 1-based line numbers for easy debugging:

```
line 42: missing field 'composition'
line 108: 'pts' is not a number
line 203: palette entry missing 'luminance'
```

Display sets with `null` composition are skipped with a stderr warning rather than aborting, so partially malformed input can still produce output for the valid display sets.

---

## Notes

- **Timestamps** use a 90 kHz clock (standard for MPEG transport streams). Divide by 90 to get milliseconds, or use the pre-computed `pts_ms` field.
- **Palette colors** are in YCrCb, not RGB. See the conversion formula above.
- **Up to 2 objects** can be shown simultaneously per composition (e.g., top and bottom subtitle lines), though the PGS spec supports up to 64 per epoch.
- **Normal-state display sets** with 0 composition objects are "clear screen" events — they signal that the previous subtitle should be removed.
- **Palette-only updates** (`palette_only: true`) change colors without replacing objects. The screen content changes appearance but the bitmap data stays the same.