Skip to main content

Crate tar_framing

Crate tar_framing 

Source
Expand description

Low level framing of tar streams.

This crate provides two APIs:

  • stream is a low-level, lossless per-block framing API.
  • logical is a medium-level, assembled member reader API.

stream provides the basic static machine enforcement for a tar stream, including ensuring that any given stream is either strictly pax or GNU and not a mix of the two. logical is layered on top of stream and provides APIs for accessing the “effective” metadata for each assembled member.

This crate tries to faithfully extract pax or GNU entries without mixing the two. See the sections below for compatibility notes.

§pax compatibility

When decoding pax-formatted tar streams, tar-framing attempts to conform to pax as specified in POSIX.1-2024, i.e. “issue 8” of the POSIX specification. See the pax specification for full details.

However, there are a few small deviations from a pedantic reading of POSIX.1-2024 that are worth noting:

  • tar-framing permits a ctime pax record, despite not being specified in POSIX.1-2024. The ctime record was removed from pax in POSIX.1-2004 (which is itself a minor edit of POSIX.1-2001). However, many real-world pax archives still contain it, and its presence does not compromise or introduce ambiguity during framing.

  • tar-framing rejects directory entries (typeflag '5') that present a nonzero size in their ustar header or pax size record. pax says that this size should be treated as a filesystem allocation hint rather than a physical size, but real-world parsers vary widely in how they handle it (some ignore it, others skip over that number of bytes, etc.).

  • tar-framing rejects regular file entries (typeflag '0' or '\0') that include a trailing slash (e.g. foo.txt/). pax is ambiguous about to handle these cases: it notes that pre-ustar tar had no directory entry typeflag and thus a trailing slash was used to indicate a directory by convention, but does not prescribe that pax implementors honor this legacy behavior. We choose to reject it since it presents the same directory size problem mentioned above.

  • tar-framing rejects negative timestamps as well as timestamps that would exceed the precision of a u64. pax allows both of these, although it notes that portable timestamps cannot be negative and that tools may reject such timestamps.

  • tar-framing silently removes fractional components from parsed timestamps. Timestamps are truncated to second precision.

  • tar-framing rejects typeflags that are not explicitly defined in pax. pax says to handle these as regular files (i.e. assuming their size is a physical size), but this has marginal benefit in practice.

  • tar-framing rejects hdrcharset pax records that aren’t UTF-8 or BINARY. pax says that “additional names may be agreed between the originator and the recipient,” but we are the recipient and we don’t accept any other hdrcharset names.

§GNU compatibility

When decoding GNU-formatted tar streams, tar-framing attempts to follow the “Basic Tar Format” in the GNU docs. Specifically, tar-framing attempts to follow the rules for the “old GNU” format, i.e. GNU tar’s non-pax format.

tar-framing intentionally only supports a subset of the GNU tar format:

  • The GNU “longname” and “longlink” ('L' and 'K') typeflags are supported, with similar path-precedence semantics as their pax record equivalents.

  • Other GNU-specific typeflags are not supported whatsoever, and produce a framing error. This includes sparse files ('S') and multivolume headers ('M').

  • tar-framing accepts the GNU-specific “base-256” encoding for numbers, but rejects negative encodings as well as any value that would exceed the precision of a u64. tar-framing also allows “base-256” encodings where the numeric value would fit into an octal encoding in the alloted buffer/byte span; GNU technically says that this is reserved for future use.

§General compatibility

Because pax and GNU both use ustar as their baseline, any compatibility aspect of pax that is derived from ustar also applies during GNU tar decoding.

tar-framing accepts wholly NUL mode, uid, gid, and mtime fields by default for compatibility with real-world writers in both families. These fields are represented as missing rather than assigned a value. This can be disabled with stream::TarStream::set_allow_all_nul_numeric_fields.

Separately, higher-level crates (like tar-codec) may choose to apply additional restrictions when processing logical archive members. For example, a consumer of tar-framing may choose to reject vendor-specific pax records, or member names that contain forbidden characters, or any other additional restriction.

Modules§

header
logical
Member-oriented reading above the lossless physical frame stream.
stream
Lossless, block-oriented tar streaming.
write
Strict POSIX-pax block construction.

Structs§

FrameError
An error encountered at an absolute position in a tar stream.
PaxExtension
One positioned parsed pax extended header.
PaxState
Unified pax metadata state applicable to one ordinary member.

Enums§

ArchiveFormat
An automatically detected, mutually exclusive tar archive family.
FrameErrorInner
Specific errors that can occur while processing tar frames.
GnuKind
The supported GNU metadata extension kinds.
HdrCharset
A character encoding for PAX pathname and user/group-name values.
PaxError
An error encountered while parsing pax extended-header records.
PaxKeyword
An owned, hashable pax extended-header keyword.
PaxKind
The scope of a pax extended header.
PaxRecord
A parsed pax extended-header record.
PaxString
A character value governed by the effective PAX HdrCharset.
PaxValue
A parsed pax value, including an explicit deletion tombstone.
UstarKind
A supported ordinary ustar member type.

Constants§

BLOCK_SIZE
The size of a logical tar record.
DEFAULT_MAX_GLOBAL_PAX_EXTENSIONS_SIZE
The default maximum cumulative size of global pax extensions before one member.
DEFAULT_MAX_GNU_EXTENSION_SIZE
The default maximum size in bytes of one GNU metadata extension.
DEFAULT_MAX_PAX_EXTENSION_SIZE
The default maximum size in bytes of one local or global pax extension.

Type Aliases§

Block
A single tar block.