tar_framing/lib.rs
1//! Low level framing of tar streams.
2//!
3//! This crate provides two APIs:
4//!
5//! - [`stream`] is a low-level, lossless per-block framing API.
6//! - [`logical`] is a medium-level, assembled member reader API.
7//!
8//! [`stream`] provides the basic static machine enforcement for a tar
9//! stream, including ensuring that any given stream is either strictly
10//! pax *or* GNU and not a mix of the two. [`logical`] is layered on top
11//! of [`stream`] and provides APIs for accessing the "effective" metadata
12//! for each assembled member.
13//!
14//! This crate tries to faithfully extract pax or GNU entries without mixing the
15//! two. See the sections below for compatibility notes.
16//!
17//! ## pax compatibility
18//!
19//! When decoding pax-formatted tar streams, tar-framing attempts to conform to
20//! pax as specified in [POSIX.1-2024], i.e. "issue 8" of the POSIX specification.
21//! See the [pax specification] for full details.
22//!
23//! However, there are a few small deviations from a pedantic reading of [POSIX.1-2024]
24//! that are worth noting:
25//!
26//! - tar-framing permits a `ctime` pax record, despite not being specified in [POSIX.1-2024].
27//! The ctime record was removed from pax in [POSIX.1-2004] (which is itself a minor edit
28//! of POSIX.1-2001). However, many real-world pax archives still contain it, and its
29//! presence does not compromise or introduce ambiguity during framing.
30//!
31//! - tar-framing rejects directory entries (typeflag `'5'`) that present a nonzero size
32//! in their ustar header or pax `size` record. pax says that this size should be treated
33//! as a filesystem allocation hint rather than a physical size, but real-world parsers vary
34//! widely in how they handle it (some ignore it, others skip over that number of bytes, etc.).
35//!
36//! - tar-framing rejects regular file entries (typeflag `'0'` or `'\0'`) that include a trailing
37//! slash (e.g. `foo.txt/`). pax is ambiguous about to handle these cases: it notes that
38//! pre-ustar tar had no directory entry typeflag and thus a trailing slash was used
39//! to indicate a directory by convention, but does not prescribe that pax implementors
40//! honor this legacy behavior. We choose to reject it since it presents the same directory
41//! size problem mentioned above.
42//!
43//! - tar-framing rejects negative timestamps as well as timestamps that would exceed the
44//! precision of a `u64`. pax allows both of these, although it notes that portable timestamps
45//! cannot be negative and that tools may reject such timestamps.
46//!
47//! - tar-framing silently removes fractional components from parsed timestamps. Timestamps
48//! are truncated to second precision.
49//!
50//! - tar-framing rejects typeflags that are not explicitly defined in pax. pax says to handle
51//! these as regular files (i.e. assuming their size is a physical size), but this has marginal
52//! benefit in practice.
53//!
54//! - tar-framing rejects `hdrcharset` pax records that aren't UTF-8 or `BINARY`. pax says
55//! that "additional names may be agreed between the originator and the recipient," but
56//! we are the recipient and we don't accept any other `hdrcharset` names.
57//!
58//! ## GNU compatibility
59//!
60//! When decoding GNU-formatted tar streams, tar-framing attempts to follow the
61//! ["Basic Tar Format"] in the GNU docs. Specifically, tar-framing attempts
62//! to follow the rules for the "old GNU" format, i.e. GNU tar's non-pax format.
63//!
64//! tar-framing intentionally only supports a subset of the GNU tar format:
65//!
66//! - The GNU "longname" and "longlink" (`'L'` and `'K'`) typeflags are supported,
67//! with similar path-precedence semantics as their pax record equivalents.
68//!
69//! - Other GNU-specific typeflags are **not** supported whatsoever, and produce
70//! a framing error. This includes sparse files (`'S'`) and multivolume headers
71//! (`'M'`).
72//!
73//! - tar-framing accepts the GNU-specific "base-256" encoding for numbers, but rejects
74//! negative encodings as well as any value that would exceed the precision of a `u64`.
75//! tar-framing also allows "base-256" encodings where the numeric value _would_ fit
76//! into an octal encoding in the alloted buffer/byte span; GNU technically says that
77//! this is reserved for future use.
78//!
79//! ## General compatibility
80//!
81//! Because pax and GNU both use ustar as their baseline, any compatibility aspect of pax
82//! that is derived from ustar also applies during GNU tar decoding.
83//!
84//! tar-framing accepts wholly NUL `mode`, `uid`, `gid`, and `mtime` fields by default for
85//! compatibility with real-world writers in both families. These fields are represented as
86//! missing rather than assigned a value. This can be disabled with
87//! [`stream::TarStream::set_allow_all_nul_numeric_fields`].
88//!
89//! Separately, higher-level crates (like tar-codec) may choose to apply additional
90//! restrictions when processing logical archive members. For example, a consumer
91//! of tar-framing may choose to reject vendor-specific pax records, or member names
92//! that contain forbidden characters, or any other additional restriction.
93//!
94//! [POSIX.1-2024]: https://pubs.opengroup.org/onlinepubs/9799919799/
95//! [pax specification]: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/pax.html
96//! [POSIX.1-2004]: https://pubs.opengroup.org/onlinepubs/009695399/toc.htm
97//! ["Basic Tar Format"]: https://www.gnu.org/software/tar/manual/html_node/Standard.html
98
99use std::fmt;
100
101mod error;
102pub mod header;
103pub mod logical;
104mod pax;
105pub mod stream;
106#[cfg(test)]
107mod test_support;
108pub mod write;
109
110pub use error::{FrameError, FrameErrorInner};
111pub use pax::{
112 HdrCharset, PaxError, PaxExtension, PaxKeyword, PaxRecord, PaxState, PaxString, PaxValue,
113};
114
115/// The size of a logical tar record.
116pub const BLOCK_SIZE: usize = 512;
117
118/// The default maximum size in bytes of one local or global pax extension.
119///
120/// This is 256 KiB.
121pub const DEFAULT_MAX_PAX_EXTENSION_SIZE: u64 = 256 * 1024;
122
123/// The default maximum cumulative size of global pax extensions before one member.
124///
125/// This is 1 MiB.
126pub const DEFAULT_MAX_GLOBAL_PAX_EXTENSIONS_SIZE: u64 = 4 * DEFAULT_MAX_PAX_EXTENSION_SIZE;
127
128/// The default maximum size in bytes of one GNU metadata extension.
129///
130/// This is 128 KiB.
131pub const DEFAULT_MAX_GNU_EXTENSION_SIZE: u64 = 128 * 1024;
132
133/// A single tar block.
134pub type Block = [u8; BLOCK_SIZE];
135
136/// An automatically detected, mutually exclusive tar archive family.
137#[derive(Clone, Copy, Debug, Eq, PartialEq)]
138pub enum ArchiveFormat {
139 /// pax ustar headers with optional pax extended headers.
140 Pax,
141 /// Old GNU tar headers with optional `L` and `K` extension entries.
142 Gnu,
143}
144
145impl fmt::Display for ArchiveFormat {
146 fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
147 match self {
148 Self::Pax => formatter.write_str("pax"),
149 Self::Gnu => formatter.write_str("GNU"),
150 }
151 }
152}
153
154/// The scope of a pax extended header.
155#[derive(Clone, Copy, Debug, Eq, PartialEq)]
156pub enum PaxKind {
157 /// A typeflag `x` header applying to the next ordinary member.
158 Local,
159 /// A typeflag `g` header updating persistent global values.
160 Global,
161}
162
163/// The supported GNU metadata extension kinds.
164#[derive(Clone, Copy, Debug, Eq, PartialEq)]
165pub enum GnuKind {
166 /// A typeflag `L` extension giving a long name for the next member.
167 LongName,
168 /// A typeflag `K` extension giving a long link name for the next member.
169 LongLink,
170}
171
172/// A supported ordinary ustar member type.
173///
174/// These are shared across both pax and GNU tar streams.
175#[derive(Clone, Copy, Debug, Eq, PartialEq)]
176pub enum UstarKind {
177 /// A regular file (`'0'` or NUL).
178 Regular,
179 /// A hard link (`'1'`).
180 HardLink,
181 /// A symbolic link (`'2'`).
182 SymbolicLink,
183 /// A character device (`'3'`).
184 CharacterDevice,
185 /// A block device (`'4'`).
186 BlockDevice,
187 /// A directory (`'5'`).
188 Directory,
189 /// A FIFO (`'6'`).
190 Fifo,
191 /// A contiguous file (`'7'`).
192 Contiguous,
193}