Skip to main content

tar_framing/
lib.rs

1//! Low level framing of tar streams.
2//!
3//! This crate provides two APIs:
4//!
5//! - [`stream`] is a low-level, lossless per-block framing API.
6//! - [`logical`] is a medium-level, assembled member reader API.
7//!
8//! [`stream`] provides the basic static machine enforcement for a tar
9//! stream, including ensuring that any given stream is either strictly
10//! pax *or* GNU and not a mix of the two. [`logical`] is layered on top
11//! of [`stream`] and provides APIs for accessing the "effective" metadata
12//! for each assembled member.
13//!
14//! This crate tries to faithfully extract pax or GNU entries without mixing the
15//! two. See the sections below for compatibility notes.
16//!
17//! ## pax compatibility
18//!
19//! When decoding pax-formatted tar streams, tar-framing attempts to conform to
20//! pax as specified in [POSIX.1-2024], i.e. "issue 8" of the POSIX specification.
21//! See the [pax specification] for full details.
22//!
23//! However, there are a few small deviations from a pedantic reading of [POSIX.1-2024]
24//! that are worth noting:
25//!
26//! - tar-framing permits a `ctime` pax record, despite not being specified in [POSIX.1-2024].
27//!   The ctime record was removed from pax in [POSIX.1-2004] (which is itself a minor edit
28//!   of POSIX.1-2001). However, many real-world pax archives still contain it, and its
29//!   presence does not compromise or introduce ambiguity during framing.
30//!
31//! - tar-framing rejects directory entries (typeflag `'5'`) that present a nonzero size
32//!   in their ustar header or pax `size` record. pax says that this size should be treated
33//!   as a filesystem allocation hint rather than a physical size, but real-world parsers vary
34//!   widely in how they handle it (some ignore it, others skip over that number of bytes, etc.).
35//!
36//! - tar-framing rejects regular file entries (typeflag `'0'` or `'\0'`) that include a trailing
37//!   slash (e.g. `foo.txt/`). pax is ambiguous about to handle these cases: it notes that
38//!   pre-ustar tar had no directory entry typeflag and thus a trailing slash was used
39//!   to indicate a directory by convention, but does not prescribe that pax implementors
40//!   honor this legacy behavior. We choose to reject it since it presents the same directory
41//!   size problem mentioned above.
42//!
43//! - tar-framing rejects negative timestamps as well as timestamps that would exceed the
44//!   precision of a `u64`. pax allows both of these, although it notes that portable timestamps
45//!   cannot be negative and that tools may reject such timestamps.
46//!
47//! - tar-framing silently removes fractional components from parsed timestamps. Timestamps
48//!   are truncated to second precision.
49//!
50//! - tar-framing rejects typeflags that are not explicitly defined in pax. pax says to handle
51//!   these as regular files (i.e. assuming their size is a physical size), but this has marginal
52//!   benefit in practice.
53//!
54//! - tar-framing rejects `hdrcharset` pax records that aren't UTF-8 or `BINARY`. pax says
55//!   that "additional names may be agreed between the originator and the recipient," but
56//!   we are the recipient and we don't accept any other `hdrcharset` names.
57//!
58//! ## GNU compatibility
59//!
60//! When decoding GNU-formatted tar streams, tar-framing attempts to follow the
61//! ["Basic Tar Format"] in the GNU docs. Specifically, tar-framing attempts
62//! to follow the rules for the "old GNU" format, i.e. GNU tar's non-pax format.
63//!
64//! tar-framing intentionally only supports a subset of the GNU tar format:
65//!
66//! - The GNU "longname" and "longlink" (`'L'` and `'K'`) typeflags are supported,
67//!   with similar path-precedence semantics as their pax record equivalents.
68//!
69//! - Other GNU-specific typeflags are **not** supported whatsoever, and produce
70//!   a framing error. This includes sparse files (`'S'`) and multivolume headers
71//!   (`'M'`).
72//!
73//! - tar-framing accepts the GNU-specific "base-256" encoding for numbers, but rejects
74//!   negative encodings as well as any value that would exceed the precision of a `u64`.
75//!   tar-framing also allows "base-256" encodings where the numeric value _would_ fit
76//!   into an octal encoding in the alloted buffer/byte span; GNU technically says that
77//!   this is reserved for future use.
78//!
79//! ## General compatibility
80//!
81//! Because pax and GNU both use ustar as their baseline, any compatibility aspect of pax
82//! that is derived from ustar also applies during GNU tar decoding.
83//!
84//! tar-framing accepts wholly NUL `mode`, `uid`, `gid`, and `mtime` fields by default for
85//! compatibility with real-world writers in both families. These fields are represented as
86//! missing rather than assigned a value. This can be disabled with
87//! [`stream::TarStream::set_allow_all_nul_numeric_fields`].
88//!
89//! Separately, higher-level crates (like tar-codec) may choose to apply additional
90//! restrictions when processing logical archive members. For example, a consumer
91//! of tar-framing may choose to reject vendor-specific pax records, or member names
92//! that contain forbidden characters, or any other additional restriction.
93//!
94//! [POSIX.1-2024]: https://pubs.opengroup.org/onlinepubs/9799919799/
95//! [pax specification]: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/pax.html
96//! [POSIX.1-2004]: https://pubs.opengroup.org/onlinepubs/009695399/toc.htm
97//! ["Basic Tar Format"]: https://www.gnu.org/software/tar/manual/html_node/Standard.html
98
99use std::fmt;
100
101mod error;
102pub mod header;
103pub mod logical;
104mod pax;
105pub mod stream;
106#[cfg(test)]
107mod test_support;
108pub mod write;
109
110pub use error::{FrameError, FrameErrorInner};
111pub use pax::{
112    HdrCharset, PaxError, PaxExtension, PaxKeyword, PaxRecord, PaxState, PaxString, PaxValue,
113};
114
115/// The size of a logical tar record.
116pub const BLOCK_SIZE: usize = 512;
117
118/// The default maximum size in bytes of one local or global pax extension.
119///
120/// This is 256 KiB.
121pub const DEFAULT_MAX_PAX_EXTENSION_SIZE: u64 = 256 * 1024;
122
123/// The default maximum cumulative size of global pax extensions before one member.
124///
125/// This is 1 MiB.
126pub const DEFAULT_MAX_GLOBAL_PAX_EXTENSIONS_SIZE: u64 = 4 * DEFAULT_MAX_PAX_EXTENSION_SIZE;
127
128/// The default maximum size in bytes of one GNU metadata extension.
129///
130/// This is 128 KiB.
131pub const DEFAULT_MAX_GNU_EXTENSION_SIZE: u64 = 128 * 1024;
132
133/// A single tar block.
134pub type Block = [u8; BLOCK_SIZE];
135
136/// An automatically detected, mutually exclusive tar archive family.
137#[derive(Clone, Copy, Debug, Eq, PartialEq)]
138pub enum ArchiveFormat {
139    /// pax ustar headers with optional pax extended headers.
140    Pax,
141    /// Old GNU tar headers with optional `L` and `K` extension entries.
142    Gnu,
143}
144
145impl fmt::Display for ArchiveFormat {
146    fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
147        match self {
148            Self::Pax => formatter.write_str("pax"),
149            Self::Gnu => formatter.write_str("GNU"),
150        }
151    }
152}
153
154/// The scope of a pax extended header.
155#[derive(Clone, Copy, Debug, Eq, PartialEq)]
156pub enum PaxKind {
157    /// A typeflag `x` header applying to the next ordinary member.
158    Local,
159    /// A typeflag `g` header updating persistent global values.
160    Global,
161}
162
163/// The supported GNU metadata extension kinds.
164#[derive(Clone, Copy, Debug, Eq, PartialEq)]
165pub enum GnuKind {
166    /// A typeflag `L` extension giving a long name for the next member.
167    LongName,
168    /// A typeflag `K` extension giving a long link name for the next member.
169    LongLink,
170}
171
172/// A supported ordinary ustar member type.
173///
174/// These are shared across both pax and GNU tar streams.
175#[derive(Clone, Copy, Debug, Eq, PartialEq)]
176pub enum UstarKind {
177    /// A regular file (`'0'` or NUL).
178    Regular,
179    /// A hard link (`'1'`).
180    HardLink,
181    /// A symbolic link (`'2'`).
182    SymbolicLink,
183    /// A character device (`'3'`).
184    CharacterDevice,
185    /// A block device (`'4'`).
186    BlockDevice,
187    /// A directory (`'5'`).
188    Directory,
189    /// A FIFO (`'6'`).
190    Fifo,
191    /// A contiguous file (`'7'`).
192    Contiguous,
193}