1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
//! Random-access, seekable `.tar.zst` archives with an embedded
//! table-of-contents index.
//!
//! A tarzan archive is a valid zstd stream that divides the compressed data
//! into independently decodable chunks and appends a table of contents (TOC)
//! as a zstd skippable frame. The TOC stores filenames, permissions,
//! ownership, sizes, and per-chunk byte offsets, so contents can be listed
//! without decompression and individual files extracted by seeking directly
//! to their chunks.
//!
//! A command-line tool (`tarzan`) is also available — see the
//! [tarzan-rs repository](https://github.com/astraw/tarzan-rs).
//!
//! # File format
//!
//! A tarzan archive is a valid zstd stream with four sections:
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────┐
//! │ Identity frame (skippable, 14 bytes) │
//! │ Magic: 0x184D2A54 Content: "TRZN" + type + version │
//! ├─────────────────────────────────────────────────────────┤
//! │ Compressed data frames │
//! │ Independent zstd frames sized around --chunk-size, │
//! │ each carrying a 4-byte XXHash64 content checksum that │
//! │ the standard zstd decoder verifies on decompression. │
//! │ Large members split across several frames; small │
//! │ members packed together to share a frame. │
//! ├─────────────────────────────────────────────────────────┤
//! │ TOC frame (skippable) │
//! │ Magic: 0x184D2A54 Content: zstd-compressed JSON TOC │
//! ├─────────────────────────────────────────────────────────┤
//! │ Footer frame (skippable, 38 bytes) │
//! │ Magic: 0x184D2A54 Content: "TRZN" + type + version │
//! │ + TOC offset (u64) + TOC size (u64) + XXHash64 (8 B) │
//! │ Hash covers bytes 0..(file_size - 38), seeded with │
//! │ the constant `ARCHIVE_HASH_SEED`. │
//! └─────────────────────────────────────────────────────────┘
//! ```
//!
//! The skippable frame magic `0x184D2A54` is shared by all four sections;
//! they are distinguished by a frame-type byte in the payload
//! (`0x01` identity, `0x02` TOC, `0x03` footer). The zstd spec defines any
//! value in `0x184D2A50`–`0x184D2A5F` as a skippable frame; tarzan-aware
//! readers identify tarzan frames via the `TRZN` ASCII identifier at offset 8,
//! not by the magic number alone.
//!
//! zstd frames are little-endian on disk, so `0x184D2A54` is written as the
//! byte sequence `54 2A 4D 18` — the first byte of every tarzan archive is
//! ASCII `T`. A hex dump confirms the identity frame:
//!
//! ```text
//! $ xxd -l 14 archive.tar.zst
//! 00000000: 542a 4d18 0600 0000 5452 5a4e 0102 T*M.....TRZN..
//! └── 0x184D2A54 ──┘ └TRZN┘
//! ```
//!
//! The version byte at offset 13 is `0x02` for the current format.
//!
//! Opening an archive reads two regions: the 14-byte identity frame at the
//! start and the 38-byte footer at the end. The footer carries the TOC's
//! byte offset and size, so the TOC is then fetched with a single seek — no
//! scanning, regardless of TOC size.
//!
//! ## Integrity layers
//!
//! - **Per data frame** — zstd's built-in XXHash64 content checksum is
//! enabled on every chunk, so a corrupted compressed byte fails at
//! decompress time with no extra work on the reader's side.
//! - **Per member** — each regular-file entry's TOC record carries the
//! SHA-256 of the file's content (no headers, no padding). Format and
//! value match `sha256sum`'s output, so users can compare against
//! on-disk files without invoking tarzan.
//! - **Whole archive** — the footer carries an XXHash64 over the entire
//! archive prefix. `tarzan verify --quick` re-hashes the file in one
//! sequential pass and compares; cheap end-to-end bit-rot detection
//! that requires no decompression.
//!
//! ## TOC schema
//!
//! The TOC is a zstd-compressed JSON object:
//!
//! ```json
//! {
//! "tarzan_version": 2,
//! "members": [
//! {
//! "path": "src/main.rs",
//! "type": "file",
//! "size": 4301,
//! "mode": 420,
//! "uid": 1000,
//! "gid": 1000,
//! "mtime": 1730643742,
//! "tar_offset": 1024,
//! "content_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
//! "chunks": [
//! {
//! "compressed_offset": 1024,
//! "compressed_size": 1891,
//! "uncompressed_size": 4301
//! }
//! ]
//! }
//! ]
//! }
//! ```
//!
//! Each chunk locates one member's bytes inside a compressed frame. A member
//! larger than the chunk size spans several chunks; small members are packed
//! together to share a frame, and the optional `frame_offset` field (omitted
//! when zero) gives the member's byte offset within that frame's decompressed
//! data.
//!
//! ## zstd compatibility
//!
//! Every tarzan archive is a valid zstd stream. Standard decoders skip the
//! identity, TOC, and footer skippable frames and decompress the data frames
//! normally:
//!
//! ```sh
//! zstd -d archive.tar.zst | tar x
//! tar --zstd -xf archive.tar.zst
//! ```
//!
//! The decompressed tar stream is bit-for-bit identical to the original.
//! What is lost is the index: listing or extracting via standard tools
//! requires a full sequential pass.
//!
//! # Usage
//!
//! ## Creating an archive
//!
//! [`wrap`] reads a raw tar stream and writes a tarzan-formatted `.tar.zst`:
//!
//! ```no_run
//! use std::fs::File;
//! use tarzan::WrapOptions;
//!
//! let input = File::open("archive.tar")?;
//! let output = File::create("archive.tar.zst")?;
//! tarzan::wrap(input, output, WrapOptions::default())?;
//! # Ok::<(), anyhow::Error>(())
//! ```
//!
//! [`WrapOptions`] controls chunk size and zstd compression level:
//!
//! ```no_run
//! # use std::fs::File;
//! # use tarzan::WrapOptions;
//! # let (input, output) = (File::open("a.tar")?, File::create("a.tar.zst")?);
//! tarzan::wrap(input, output, WrapOptions::default()
//! .chunk_size(1024 * 1024) // 1 MB chunks
//! .level(9))?;
//! # Ok::<(), anyhow::Error>(())
//! ```
//!
//! ## Reading an archive
//!
//! [`TarzanReader`] opens an archive and gives access to the TOC without
//! decompressing any data frames:
//!
//! ```no_run
//! use std::path::Path;
//! use tarzan::TarzanReader;
//!
//! let reader = TarzanReader::open(Path::new("archive.tar.zst"))?;
//! for member in reader.members() {
//! println!("{} ({} bytes)", member.path, member.size);
//! }
//! # Ok::<(), anyhow::Error>(())
//! ```
//!
//! ## Extracting a single member
//!
//! [`TarzanReader::extract_member`] seeks directly to the member's chunks and
//! decompresses only those frames:
//!
//! ```no_run
//! # use std::path::Path;
//! # use tarzan::TarzanReader;
//! let mut reader = TarzanReader::open(Path::new("archive.tar.zst"))?;
//! let mut out = std::fs::File::create("main.rs")?;
//! reader.extract_member("src/main.rs", &mut out)?;
//! # Ok::<(), anyhow::Error>(())
//! ```
pub use crateExtractOptions;
pub use cratePathFilter;
pub use crate;
pub use crate;