Skip to main content

tar_framing/
lib.rs

1//! Low level framing of tar streams.
2//!
3//! This crate provides two APIs:
4//!
5//! - [`stream`] is a low-level, lossless per-block framing API.
6//! - [`logical`] is a medium-level, assembled member reader API.
7//!
8//! [`stream`] provides the basic static machine enforcement for a tar
9//! stream, including ensuring that any given stream is either strictly
10//! pax *or* GNU and not a mix of the two. [`logical`] is layered on top
11//! of [`stream`] and provides APIs for accessing the "effective" metadata
12//! for each assembled member.
13//!
14//! This crate tries to faithfully extract pax or GNU entries without mixing the
15//! two. See the sections below for compatibility notes.
16//!
17//! ## pax compatibility
18//!
19//! When decoding pax-formatted tar streams, tar-framing attempts to conform to
20//! pax as specified in [POSIX.1-2024], i.e. "issue 8" of the POSIX specification.
21//! See the [pax specification] for full details.
22//!
23//! However, there are a few small deviations from a pedantic reading of [POSIX.1-2024]
24//! that are worth noting:
25//!
26//! - tar-framing permits a `ctime` pax record, despite not being specified in [POSIX.1-2024].
27//!   The ctime record was removed from pax in [POSIX.1-2004] (which is itself a minor edit
28//!   of POSIX.1-2001). However, many real-world pax archives still contain it, and its
29//!   presence does not compromise or introduce ambiguity during framing.
30//!
31//! - tar-framing permits implementation-defined vendor pax records to contain arbitrary
32//!   bytes, including invalid UTF-8. POSIX requires extended-header values to default
33//!   to UTF-8, but real-world writers including star, GNU tar, and libarchive use raw
34//!   `SCHILY.xattr.*` values. Standard and reserved pax record values remain strict UTF-8.
35//!
36//! - tar-framing rejects directory entries (typeflag `'5'`) that present a nonzero size
37//!   in their ustar header or pax `size` record. pax says that this size should be treated
38//!   as a filesystem allocation hint rather than a physical size, but real-world parsers vary
39//!   widely in how they handle it (some ignore it, others skip over that number of bytes, etc.).
40//!
41//! - tar-framing rejects regular file entries (typeflag `'0'` or `'\0'`) that include a trailing
42//!   slash (e.g. `foo.txt/`). pax is ambiguous about to handle these cases: it notes that
43//!   pre-ustar tar had no directory entry typeflag and thus a trailing slash was used
44//!   to indicate a directory by convention, but does not prescribe that pax implementors
45//!   honor this legacy behavior. We choose to reject it since it presents the same directory
46//!   size problem mentioned above.
47//!
48//! - tar-framing rejects negative timestamps as well as timestamps that would exceed the
49//!   precision of a `u64`. pax allows both of these, although it notes that portable timestamps
50//!   cannot be negative and that tools may reject such timestamps.
51//!
52//! - tar-framing silently removes fractional components from parsed timestamps. Timestamps
53//!   are truncated to second precision.
54//!
55//! - tar-framing rejects typeflags that are not explicitly defined in pax. pax says to handle
56//!   these as regular files (i.e. assuming their size is a physical size), but this has marginal
57//!   benefit in practice.
58//!
59//! - tar-framing rejects `hdrcharset` pax records that aren't UTF-8 or `BINARY`. pax says
60//!   that "additional names may be agreed between the originator and the recipient," but
61//!   we are the recipient and we don't accept any other `hdrcharset` names.
62//!
63//! ## GNU compatibility
64//!
65//! When decoding GNU-formatted tar streams, tar-framing attempts to follow the
66//! ["Basic Tar Format"] in the GNU docs. Specifically, tar-framing attempts
67//! to follow the rules for the "old GNU" format, i.e. GNU tar's non-pax format.
68//!
69//! tar-framing intentionally only supports a subset of the GNU tar format:
70//!
71//! - The GNU "longname" and "longlink" (`'L'` and `'K'`) typeflags are supported,
72//!   with similar path-precedence semantics as their pax record equivalents.
73//!
74//! - Other GNU-specific typeflags are **not** supported whatsoever, and produce
75//!   a framing error. This includes sparse files (`'S'`) and multivolume headers
76//!   (`'M'`).
77//!
78//! - tar-framing accepts the GNU-specific "base-256" encoding for numbers, but rejects
79//!   negative encodings as well as any value that would exceed the precision of a `u64`.
80//!   tar-framing also allows "base-256" encodings where the numeric value _would_ fit
81//!   into an octal encoding in the alloted buffer/byte span; GNU technically says that
82//!   this is reserved for future use.
83//!
84//! ## General compatibility
85//!
86//! Because pax and GNU both use ustar as their baseline, any compatibility aspect of pax
87//! that is derived from ustar also applies during GNU tar decoding.
88//!
89//! tar-framing accepts wholly NUL `mode`, `uid`, `gid`, and `mtime` fields by default for
90//! compatibility with real-world writers in both families. These fields are represented as
91//! missing rather than assigned a value. This can be disabled with
92//! [`stream::TarStream::set_allow_all_nul_numeric_fields`].
93//!
94//! Separately, higher-level crates (like tar-codec) may choose to apply additional
95//! restrictions when processing logical archive members. For example, a consumer
96//! of tar-framing may choose to reject vendor-specific pax records, or member names
97//! that contain forbidden characters, or any other additional restriction.
98//!
99//! [POSIX.1-2024]: https://pubs.opengroup.org/onlinepubs/9799919799/
100//! [pax specification]: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/pax.html
101//! [POSIX.1-2004]: https://pubs.opengroup.org/onlinepubs/009695399/toc.htm
102//! ["Basic Tar Format"]: https://www.gnu.org/software/tar/manual/html_node/Standard.html
103
104use std::fmt;
105
106mod error;
107pub mod header;
108pub mod logical;
109mod pax;
110pub mod stream;
111#[cfg(test)]
112mod test_support;
113pub mod write;
114
115pub use error::{FrameError, FrameErrorInner};
116pub use pax::{
117    HdrCharset, PaxError, PaxExtension, PaxKeyword, PaxRecord, PaxState, PaxString, PaxValue,
118};
119
120/// The size of a logical tar record.
121pub const BLOCK_SIZE: usize = 512;
122
123/// The default maximum size in bytes of one local or global pax extension.
124///
125/// This is 256 KiB.
126pub const DEFAULT_MAX_PAX_EXTENSION_SIZE: u64 = 256 * 1024;
127
128/// The default maximum cumulative size of global pax extensions before one member.
129///
130/// This is 1 MiB.
131pub const DEFAULT_MAX_GLOBAL_PAX_EXTENSIONS_SIZE: u64 = 4 * DEFAULT_MAX_PAX_EXTENSION_SIZE;
132
133/// The default maximum size in bytes of one GNU metadata extension.
134///
135/// This is 128 KiB.
136pub const DEFAULT_MAX_GNU_EXTENSION_SIZE: u64 = 128 * 1024;
137
138/// A single tar block.
139pub type Block = [u8; BLOCK_SIZE];
140
141/// An automatically detected, mutually exclusive tar archive family.
142#[derive(Clone, Copy, Debug, Eq, PartialEq)]
143pub enum ArchiveFormat {
144    /// pax ustar headers with optional pax extended headers.
145    Pax,
146    /// Old GNU tar headers with optional `L` and `K` extension entries.
147    Gnu,
148}
149
150impl fmt::Display for ArchiveFormat {
151    fn fmt(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
152        match self {
153            Self::Pax => formatter.write_str("pax"),
154            Self::Gnu => formatter.write_str("GNU"),
155        }
156    }
157}
158
159/// The scope of a pax extended header.
160#[derive(Clone, Copy, Debug, Eq, PartialEq)]
161pub enum PaxKind {
162    /// A typeflag `x` header applying to the next ordinary member.
163    Local,
164    /// A typeflag `g` header updating persistent global values.
165    Global,
166}
167
168/// The supported GNU metadata extension kinds.
169#[derive(Clone, Copy, Debug, Eq, PartialEq)]
170pub enum GnuKind {
171    /// A typeflag `L` extension giving a long name for the next member.
172    LongName,
173    /// A typeflag `K` extension giving a long link name for the next member.
174    LongLink,
175}
176
177/// A supported ordinary ustar member type.
178///
179/// These are shared across both pax and GNU tar streams.
180#[derive(Clone, Copy, Debug, Eq, PartialEq)]
181pub enum UstarKind {
182    /// A regular file (`'0'` or NUL).
183    Regular,
184    /// A hard link (`'1'`).
185    HardLink,
186    /// A symbolic link (`'2'`).
187    SymbolicLink,
188    /// A character device (`'3'`).
189    CharacterDevice,
190    /// A block device (`'4'`).
191    BlockDevice,
192    /// A directory (`'5'`).
193    Directory,
194    /// A FIFO (`'6'`).
195    Fifo,
196    /// A contiguous file (`'7'`).
197    Contiguous,
198}