Skip to main content

Parser

Struct Parser 

Source
pub struct Parser { /* private fields */ }
Expand description

Sans-IO tar archive parser.

This parser operates as a state machine on &[u8] input slices. It does not perform any I/O itself - the caller is responsible for providing data and handling the parsed events.

§Usage

The caller feeds header bytes to parse(). On Entry, the caller reads/skips entry.size bytes of content (plus padding to the next 512-byte boundary) from its own I/O source, then calls parse() again with the next header bytes. The parser does not see or track content bytes.

let mut parser = Parser::new(Limits::default());
let mut buf = vec![0u8; 65536];
let mut filled = 0;

loop {
    match parser.parse(&buf[..filled]) {
        Ok(ParseEvent::NeedData { min_bytes }) => {
            let n = read_more(&mut buf[filled..])?;
            filled += n;
            if n == 0 && filled < min_bytes {
                return Err("unexpected EOF");
            }
        }
        Ok(ParseEvent::Entry { consumed, entry }) => {
            process_entry(&entry);
            // Read/skip entry.size bytes + padding, then clear buf
            skip_content(entry.padded_size())?;
            filled = 0;
        }
        Ok(ParseEvent::End { .. }) => break,
        Err(e) => return Err(e),
    }
}

Implementations§

Source§

impl Parser

Source

pub fn new(limits: Limits) -> Self

Create a new parser with the given limits.

Source

pub fn set_allow_empty_path(&mut self, allow: bool)

Allow entries with empty paths instead of rejecting them with ParseError::EmptyPath.

Source

pub fn set_verify_checksums(&mut self, verify: bool)

Control whether header checksums are verified during parsing.

When set to false, the parser skips Header::verify_checksum calls, accepting headers regardless of their checksum field. This is primarily useful for fuzz testing, where random input almost never produces valid checksums, preventing the fuzzer from reaching deeper parser code paths.

Default: true.

Source

pub fn set_ignore_pax_errors(&mut self, ignore: bool)

Control whether malformed PAX extension values are silently ignored.

When set to true, PAX values that fail to parse (invalid UTF-8, unparseable integers for uid, gid, size, mtime) are skipped instead of producing ParseError::InvalidPaxValue errors. This matches the lenient behavior of many real-world tar implementations.

Default: false (malformed values produce errors).

Source

pub fn with_defaults() -> Self

Create a new parser with default limits.

Source

pub fn limits(&self) -> &Limits

Get the current limits.

Source

pub fn is_done(&self) -> bool

Check if the parser is done (archive complete).

Source

pub fn parse<'a>(&mut self, input: &'a [u8]) -> Result<ParseEvent<'a>>

Parse the next event from the input buffer.

Returns a ParseEvent on success. Entry and End events include a consumed field indicating how many bytes were consumed from the input; the caller should advance past that many bytes in their buffer.

§Events
  • NeedData { min_bytes }: Need at least min_bytes more data (nothing consumed)
  • Entry { consumed, entry }: A complete entry header; caller must handle content
  • End { consumed }: Archive is complete

After receiving an Entry event, the caller is responsible for reading or skipping entry.size bytes of content (plus padding to the next 512-byte boundary) before calling parse() again.

Trait Implementations§

Source§

impl Debug for Parser

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.