bindet 0.3.2 - Docs.rs

//! Fast binary file type detection
//!
//! `bindet` provides a fast and safe binary file detection even for large files.
//!
//! The worst case for `bindet` is `O(n)`, but some tricks are applied to try to amortize
//! the time complexity to `O(1)`, so in most of the cases it does not take `O(n)` to execute.
//!
//! ## Supported file types
//!
//! - Zip
//! - Rar (4 and 5)
//! - Tar (uncompressed)
//! - LZMA
//! - 7zXZ
//! - Zst
//! - Png
//! - Jpg
//! - 7-zip
//! - Opus
//! - Vorbis
//! - Mp3
//! - Webp
//! - Flac
//! - Matroska (mkv, mka, mks, mk3d, webm)
//! - Wasm
//! - Java Class
//! - Scala Tasty
//! - Mach-O
//! - Elf (Executable and Linkable Format)
//! - Wav
//! - Avi
//! - Aiff
//! - Tiff
//! - Sqlite3 (`.db`)
//! - Ico
//! - Dalvik
//! - Pdf
//! - Exe/Dll
//! - Gif
//! - Xcf
//! - Scala Tasty
//! - Bmp
//! - Iso
//! - Swf/Swc
//! - (some may be missing, please refer to [FileType])
//!
//! ## First Step
//!
//! File detection is made in a two-pass process, first it tries to find the magic number at the start
//! of the [Read], if the magic number if found, a second pass may be done to ensure correctness of detection.
//! For example, [`FileType::Zip`](FileType::Zip) does have a **Local File Header** which starts
//! with a 4-byte descriptor and a **End of central directory record** that appears at the
//! end of non-empty zip files.
//!
//! Some files can de detect only by looking at the start of the file, using a fixed-size buffer,
//! which guarantees `O(1)` for simple detection and a amortized `O(1)` for correctness. Also, some file
//! types, such as [RAR SFX](https://documentation.help/WinRAR/HELPArcSFX.htm) states that the
//! magic number may be found from the start of the file up to SFX module size (which is of `1 MB`),
//! this means that in the worst case, we need to do a sliding window up to `1 MB` to find this value,
//! this type of check happens in the second step.
//!
//! ## Second Step
//!
//! In the first step, we use a small buffer to store initial bytes of the data and try to detect
//! the file type, in the second step we use a larger buffer size, up to the size of the largest
//! lookup range (which at the moment is of `1 MB`, which matches with RAR5 specification) and
//! use a sliding window to find a range that matches the magic number sequence.
//!
//! Also, the same strategy is applied to [`detect_at_end`](detect_at_end) logic, it looks into the
//! file backwardly, using a sliding window, to find a matching sequence of bytes, this logic is
//! used to ensure correctness for file types that does have a sequence of bytes that appear at the end.
//!
//! ### Worst-case scenario
//!
//! For [`detect`](detect) function, we mixes reading from the start and then only do backward sliding
//! at the end for types that have matched at the start, this improves the accuracy of file detection,
//! with the cost that if a marker is found at the start, and the specification states that there is a
//! marker at the end, and we do the backward sliding-window, and there is no marker at the end, we
//! will have traversed the entire data stream, with a time complexity of `O(n)`, so, the worst case
//! of file detection is linear.
//!
//! However, even with a linear worst case, we assume that in the most scenarios the marker at the
//! start will be enough to detect the file type. And if this is not enough and we need to look at the end,
//! we assume that in most cases we will not need to slide the window until the start of the stream,
//! assuming that the algorithm will find the marker closer to the end than to the start.
//!
//! Further benchmarks can be done to check if **bindet** amortized time complexity is really `O(1)`, given
//! a bunch of files to be detected.
//!
//! ### Examples
//!
//! ```
//! use std::fs::{OpenOptions};
//! use std::io::BufReader;
//! use std::io::ErrorKind;
//! use bindet;
//! use bindet::types::FileType;
//! use bindet::FileTypeMatch;
//! use bindet::FileTypeMatches;
//!
//! let file = OpenOptions::new().read(true).open("files/test.tar").unwrap();
//! let buf = BufReader::new(file);
//!
//! let detect = bindet::detect(buf).map_err(|e| e.kind());
//! let expected: Result<Option<FileTypeMatches>, ErrorKind> = Ok(Some(FileTypeMatches::new(
//!     vec![FileType::Tar],
//!     vec![FileTypeMatch::new(FileType::Tar, true)]
//! )));
//!
//! assert_eq!(detect, expected);
//! ```
//!
//! ### Features
//!
//! #### `nightly`
//!
//! Uses [Macro MetaVar Expression Counting](https://rust-lang.github.io/rfcs/3086-macro-metavar-expr.html#count) instead of
//! [Repetition Counting through tuple slice length](https://danielkeep.github.io/tlborm/book/blk-counting.html#slice-length).
//!
//! Enabling nightly feature flag does not have any impact on the performance, neither runtime nor compile time,
//! since both stable and nightly approach gets optimized and inlined at compile time, but has the same
//! negligible compile-time cost.
//!
//! This only exists because [Macro MetaVar Expression](https://rust-lang.github.io/rfcs/3086-macro-metavar-expr.html)
//! is still being discussed[^1] (and may not reach stable 1.63 or 1.64, and it is late for 1.62),
//! even though `$$` and `${ignore(_)}` are targeting 1.62[^2] and
//! [it will probably be delivered](https://github.com/rust-lang/rust/pull/95860#issuecomment-1094136943),
//! `${count(_)}` is one of the features that are left out of the stabilization because it may need more
//! refinement and discussion
//!
//! **bindet** don't need this *personally*, since it does not have too much elements to cause a compiler crash,
//! but keeping it in the source code helps us to remember to deliver it as a default when it reach stable,
//! and reduce the amount of hacky/tricky things we need to do with declarative macros (which already needs a bunch of tricks).
//!
//! #### `mime`
//! Enables conversion from [`FileType`][`types::FileType`] and [`FileRootType`][`FileRootType`] to
//! `Mime` by implementing `TryInto<Mime>` trait,
//! there is no need to `use` any additional module, just enable the feature.
//!
//! #### `mediatype`
//!
//! Enables conversion from [`FileType`][`types::FileType`] and [`FileRootType`][`FileRootType`] to
//! `MediaTypeBuf` by implementing `TryInto<MediaTypeBuf>` trait,
//! there is no need to `use` any additional module, just enable the feature.
//!
//! [^1]: <https://github.com/rust-lang/rust/issues/83527>
//!
//! [^2]: <https://github.com/rust-lang/rust/pull/95860>
#![cfg_attr(feature = "nightly", feature(test))]
// TODO: looking for https://github.com/rust-lang/rust/issues/83527
#![cfg_attr(feature = "nightly", feature(macro_metavar_expr))]

use crate::description::FileTypeDescription;
use crate::matcher::{FileTypeMatcher, RelativePosition, Step, TestResult};
pub use crate::types::FileRootType;
pub use crate::types::FileType;
use std::collections::HashSet;
use std::io::{ErrorKind, Read, Seek, SeekFrom};
use std::prelude::rust_2021::TryFrom;

#[cfg(any(feature = "mime", feature = "mediatype"))]
mod conv;
pub mod description;
pub mod matcher;
pub mod types;

/// Stores information about a specific [FileType] match result.
#[derive(Debug, Clone, Eq, PartialEq, Hash)]
pub struct FileTypeMatch {
    /// [FileType] that matched
    pub file_type: FileType,
    /// If the file magic number perfectly matched, or if the match is a probable match.
    pub full_match: bool,
}

impl FileTypeMatch {
    pub fn new(file_type: FileType, full_match: bool) -> FileTypeMatch {
        FileTypeMatch {
            file_type,
            full_match,
        }
    }
}

#[derive(Debug, Clone, Eq, PartialEq)]
pub struct FileTypeMatches {
    /// [`FileTypes`][FileType] that have a perfect match.
    pub likely_to_be: Vec<FileType>,
    /// All [FileType] that matched, perfectly or not.
    pub all_matches: Vec<FileTypeMatch>,
}

impl FileTypeMatches {
    pub fn new(likely_to_be: Vec<FileType>, all_matches: Vec<FileTypeMatch>) -> FileTypeMatches {
        FileTypeMatches {
            likely_to_be,
            all_matches,
        }
    }
}

/// Detect a file type by looking at the start and at the end of the file (at the end only for
/// applicable file types)
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
pub fn detect<R>(mut read: R) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
    R: Seek,
{
    let at_start = detect_at_start_from_ref(&mut read)?;

    if let Some(start) = at_start {
        let types: Vec<FileType> = start.all_matches.iter().map(|s| s.file_type).collect();
        let at_end = detect_variants_at_end_from_ref(&mut read, types.as_slice())?;

        if let Some(at_end) = at_end {
            let start_matches: Vec<FileType> =
                start.all_matches.iter().map(|c| c.file_type).collect();

            let perfect: Vec<FileType> = at_end
                .all_matches
                .iter()
                .map(|t| t.file_type)
                .filter(|c| start_matches.contains(c))
                .collect();

            let mut all_likely: Vec<FileType> = vec![];
            all_likely.extend(start.likely_to_be);
            all_likely.extend(perfect);

            all_likely.dedup_by_key(|v| *v);

            let mut merged: Vec<FileTypeMatch> = vec![];
            merged.extend(start.all_matches);
            merged.extend(at_end.all_matches);

            let mut mapped_merged_items = merged
                .iter()
                .map(|v| FileTypeMatch {
                    file_type: v.file_type,
                    full_match: all_likely.contains(&v.file_type),
                })
                .collect::<Vec<FileTypeMatch>>();

            mapped_merged_items.dedup_by_key(|v| v.file_type);

            Ok(Some(FileTypeMatches {
                likely_to_be: all_likely,
                all_matches: mapped_merged_items,
            }))
        } else {
            Ok(Some(start))
        }
    } else {
        Ok(None)
    }
}

/// Detect a file type by looking at the start of the file. Types that need a second check at the end
/// may be reported with [`FileTypeMatch.full_match = false`](FileTypeMatch) signaling a probable
/// match.
///
/// This is a less reliable version of [`detect`], but with amortized `O(1)` time complexity.
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// This version receives a `variants` parameter that allows to specify which file types to check against.
///
/// **This is the ownership taking version of [`detect_at_start_from_ref`].**
///
/// Read more in [`detect_variants_at_start_from_ref`].
pub fn detect_at_start<R>(mut read: R) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
{
    detect_variants_at_start_from_ref(&mut read, &FileType::variants())
}

/// Detect a file type by looking at the start of the file. Types that need a second check at the end
/// may be reported with [`FileTypeMatch.full_match = false`](FileTypeMatch) signaling a probable
/// match.
///
/// This is a less reliable version of [`detect`], but with amortized `O(1)` time complexity.
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// This version receives a `variants` parameter that allows to specify which file types to check against.
///
/// **This is the reference taking version of [`detect_at_start`].**
///
/// Read more in [`detect_variants_at_start_from_ref`].
pub fn detect_at_start_from_ref<R>(read: &mut R) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
{
    detect_variants_at_start_from_ref(read, &FileType::variants())
}

/// Detect a file type by looking at the start of the file. Types that need a second check at the end
/// may be reported with [`FileTypeMatch.full_match = false`](FileTypeMatch) signaling a probable
/// match.
///
/// This is a less reliable version of [`detect`], but with amortized `O(1)` time complexity.
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// This version receives a `variants` parameter that allows to specify which file types to check against.
///
/// **This is the ownership taking version of [`detect_variants_at_start_from_ref`].**
///
/// Read more in [`detect_variants_at_start_from_ref`].
pub fn detect_variants_at_start<R>(
    mut read: R,
    variants: &[FileType],
) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
{
    detect_variants_at_start_from_ref(&mut read, variants)
}

/// Detect a file type by looking at the start of the file. Types that need a second check at the end
/// may be reported with [`FileTypeMatch.full_match = false`](FileTypeMatch) signaling a probable
/// match.
///
/// This is a less reliable version of [`detect`], but with amortized `O(1)` time complexity.
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// This version receives a `variants` parameter that allows to specify which file types to check.
///
/// **This is the reference taking version.**
///
/// ## Performance note
///
/// It is not `O(1)` time complexity because small files has less variants to test against, and bigger
/// files has more variants, also, bigger files are slower to read (even if you read the same amount of bytes),
/// mainly on compressed File Systems. In addition to this, files that match perfectly any variant
/// (i.e. are detected) have faster detection time than files that does not match, this include bigger files,
/// for example, an 1GiB `AVI` can be detected by looking at the first 4 bytes, while a 100MiB that is
/// not detectable by this lib, falls to the second detection approach, which reads more data from the file.
///
/// The worst performance is always the [maximum block size of provided `variants`](FileType::maximum_block_size_of_variants),
/// so any undetectable file bigger than this will have the same performance regardless of its size.
pub fn detect_variants_at_start_from_ref<R>(
    read: &mut R,
    variants: &[FileType],
) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
{
    let start_position = RelativePosition::Start;
    let small = FileType::ideal_block_size_of_variants(&start_position, variants);
    let mut matches: Vec<FileTypeMatch> = vec![];

    let mut read_data: Vec<u8> = vec![];

    if let Some((size, types)) = small {
        let mut buff = vec![0u8; size];
        let buff_slice = &mut buff[..];

        let read = loop {
            // Do this because Read::read may fail on interruption, but it does not mean that there is no data to read.
            // This is mostly the same approach the `Read::read_to_end` uses.
            match read.read(buff_slice) {
                Ok(i) => break i,
                Err(e) if e.kind() == ErrorKind::Interrupted => continue,
                Err(e) => return Err(e),
            };
        };

        let bytes = &buff_slice[..read];
        read_data.extend_from_slice(bytes);

        push_matched_types_into(&mut matches, bytes, &start_position, &Step::Small, &types);
    }

    let any_perfect_match = matches.iter().filter(|v| v.full_match).count() > 0;

    if any_perfect_match {
        let perfect: Vec<FileType> = matches
            .iter()
            .filter(|v| v.full_match)
            .map(|v| v.file_type)
            .collect();

        return Ok(Some(FileTypeMatches {
            likely_to_be: perfect,
            all_matches: matches,
        }));
    }

    // Most of the time is taken here,
    // this will change with the implementation of https://gitlab.com/Kores/bindet/-/issues/5
    let big = FileType::maximum_block_size_of_variants(&start_position, variants);
    if let Some((size, types)) = big {
        // the size corresponding to remaining data that was not read yet.
        // which is the maximum block SIZE minus the total of data that was already read.
        let new_size = size - read_data.len();

        // creates a buffer with the needed size to hold the remaining data.
        let mut buff = vec![0u8; new_size];
        let buff_slice = &mut buff[..];
        // stores the total amount of data read so far, including the data already read before.
        let mut all_read = read_data.len();
        // stores the amount of data that was filled to buff.
        let mut filled = 0usize;

        while all_read < new_size {
            let read = match read.read(&mut buff_slice[filled..]) {
                Ok(i) => i,
                Err(e) if e.kind() == ErrorKind::Interrupted => continue,
                Err(e) => return Err(e),
            };

            if read == 0 {
                break;
            }

            filled += read;
            all_read += read;
        }

        if filled != 0 {
            read_data.extend_from_slice(&buff_slice[..filled]);
            let bytes = &read_data[..];

            push_matched_types_into(&mut matches, bytes, &start_position, &Step::Large, &types);
        }
    }

    if !matches.is_empty() {
        let types: Vec<FileType> = matches.iter().map(|v| v.file_type).collect();
        return Ok(Some(FileTypeMatches {
            likely_to_be: types,
            all_matches: matches,
        }));
    }

    Ok(None)
}

/// Detect a file type by using a backward sliding window, this approach does have a `O(n)` time complexity
/// and is not meant to be used directly.
///
/// Currently this only works for [FileType::Zip].
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// **This is the ownership taking version of [detect_at_end_from_ref]**.
pub fn detect_at_end<R>(mut read: R) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
    R: Seek,
{
    detect_variants_at_end_from_ref(&mut read, &FileType::variants())
}

/// Detect a file type by using a backward sliding window, this approach does have a `O(n)` time complexity
/// and is not meant to be used directly.
///
/// Currently this only works for [FileType::Zip].
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// **This is the ownership taking version of [detect_variants_at_end_from_ref]**.
pub fn detect_variants_at_end<R>(
    mut read: R,
    variants: &[FileType],
) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
    R: Seek,
{
    detect_variants_at_end_from_ref(&mut read, variants)
}

/// Detect a file type by using a backward sliding window, this approach does have a `O(n)` time complexity
/// and is not meant to be used directly.
///
/// Currently this only works for [FileType::Zip].
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// **This is the reference taking version of [detect_at_end]**.
pub fn detect_at_end_from_ref<R>(read: &mut R) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
    R: Seek,
{
    detect_variants_at_end_from_ref(read, &FileType::variants())
}

/// Detect a file type by using a backward sliding window, this approach does have a `O(n)` time complexity
/// and is not meant to be used directly.
///
/// Currently this only works for [FileType::Zip].
///
/// Since different [FileType] detection implementations receives the same data slice, it may produce
/// more than one matching types.
///
/// This version receives a `variants` parameter that allows to specify which file types to check.
///
/// **Maintenance Note**
///
/// Review this algorithm, backward sliding window should be an option, and file type detectors
/// should signal to take them out of the detection list once a detection fails,
/// so it stops wasting time when a detection fails.
///
/// Some detections may need a backward sliding window, others may only need to do an exact match
/// in the bytes that appears at the end of the file.
///
/// This thing is also a consideration for the Milestone 1.0.0 release issues, both Basic, StartScan
/// and FullScan should be reviewed, some file types may need a sliding window right from the start,
/// which we don't support yet, and the workaround is to request the maximum possible size and do the
/// sliding window detection from here, but does not work when the file is shorter than the requested size,
/// also, some may signal to stop at a specific point, for example, the magic number may appear anywhere
/// in between byte `(0..1000000)`, but not after this point, the same for backward sliding window.
///
/// And the other thing for consideration is arbitrary seeking, which could be a different `DetectionMode`,
/// it would look at fixed positions of the file for the magic number, and it would be good to be able
/// to create a sliding window from here to the start or to the end, with a maximum size support,
/// or undefined size, which would be the same as sliding through the entire file.
pub fn detect_variants_at_end_from_ref<R>(
    read: &mut R,
    variants: &[FileType],
) -> Result<Option<FileTypeMatches>, std::io::Error>
where
    R: Read,
    R: Seek,
{
    let end_position = RelativePosition::End;
    let small = FileType::ideal_block_size_of_variants(&end_position, variants);

    let mut matches: Vec<FileTypeMatch> = vec![];

    if let Some((size, types)) = small {
        let seek = read.seek(SeekFrom::End(0))?;

        let real_size = if seek > u64::try_from(size).unwrap() {
            size
        } else {
            usize::try_from(seek).unwrap()
        };

        let mut buff = vec![0u8; real_size];
        let buff_slice = &mut buff[..];

        let mut back = -i64::try_from(real_size).unwrap();
        let mut seek = read.seek(SeekFrom::End(back))?; // Skip real_size

        loop {
            if seek > 0 {
                // Do this because Read::read may fail on interruption, but it does not mean that there is no data to read.
                // This is mostly the same approach the `Read::read_to_end` uses.
                let read_bytes = loop {
                    match read.read(buff_slice) {
                        Ok(i) => break i,
                        Err(e) if e.kind() == ErrorKind::Interrupted => continue,
                        Err(e) => return Err(e),
                    };
                };

                let bytes = &buff_slice[..read_bytes];

                push_matched_types_into(&mut matches, bytes, &end_position, &Step::Small, &types);

                if types.len() == matches.len() {
                    break;
                }
                back -= 1; // Move back
                seek = read.seek(SeekFrom::End(back))?;
            } else {
                break;
            }
        }
    }

    let any_perfect_match = matches.iter().filter(|v| v.full_match).count() > 0;

    if any_perfect_match {
        let perfect: Vec<FileType> = matches
            .iter()
            .filter(|v| v.full_match)
            .map(|v| v.file_type)
            .collect();

        return Ok(Some(FileTypeMatches {
            likely_to_be: perfect,
            all_matches: matches,
        }));
    }

    // TODO: none of the FileTypes does have a maximum_block_size for windowing
    /*let big = FileType::maximum_block_size_of_variants(&end_position, variants);

    if let Some((size, types)) = big {
        let new_size = size - readed_.len();
        let mut buff = vec![0u8; new_size];
        let mut buff_slice = &mut buff[..];
        let read = read.read(buff_slice)?;

        if read != 0 {
            readed_.extend_from_slice(&buff_slice[..read]);
            let mut bytes = &readed_[..];

            push_matched_types_into(&mut matches, &bytes, &end_position, types);
        }
    }*/

    if !matches.is_empty() {
        let types: Vec<FileType> = matches.iter().map(|v| v.file_type).collect();
        return Ok(Some(FileTypeMatches {
            likely_to_be: types,
            all_matches: matches,
        }));
    }

    Ok(None)
}

fn push_matched_types_into(
    matches: &mut Vec<FileTypeMatch>,
    bytes: &[u8],
    relative_position: &RelativePosition,
    step: &Step,
    types: &Vec<FileType>,
) {
    let mut matched_roots = HashSet::new();
    for file_type in types {
        let root = file_type.root();
        if matched_roots.contains(&root) {
            continue;
        }

        let matched = file_type.test(relative_position, step, bytes);
        if matched != TestResult::NotMatched {
            matches.push(FileTypeMatch {
                file_type: *file_type,
                full_match: matched == TestResult::Matched,
            });

            matched_roots.insert(root);
        }
    }
}

#[cfg(test)]
mod tests {
    use crate::types::FileType;
    use crate::{
        detect, detect_at_end_from_ref, detect_at_start_from_ref, FileTypeMatch, FileTypeMatches,
    };
    use std::collections::HashSet;
    use std::fs::OpenOptions;
    use std::io::{BufReader, Error, ErrorKind};
    use std::path::Path;

    #[test]
    fn test_zip_detect() {
        test_detect_match("files/hello.zip", FileType::Zip);
    }

    #[test]
    fn test_rar_detect() {
        test_detect_match("files/hello.rar", FileType::Rar5);
    }

    #[test]
    fn test_rar_sfx_detect() {
        test_detect_match_n(
            "files/hello-world.exe",
            vec![
                FileTypeMatch::new(FileType::Rar5, true),
                FileTypeMatch::new(FileType::DosMzExecutable, false),
            ],
        );
    }

    #[test]
    fn test_fast_rar_sfx_detect() {
        test_fast_detect_match_n(
            "files/hello-world.exe",
            vec![
                FileTypeMatch::new(FileType::Rar5, true),
                FileTypeMatch::new(FileType::DosMzExecutable, false),
            ],
        );
    }

    #[test]
    fn test_2mib_rar_sfx_detect() {
        test_fast_detect_match_n(
            "files/2mib.exe",
            vec![
                FileTypeMatch::new(FileType::Rar5, true),
                FileTypeMatch::new(FileType::DosMzExecutable, false),
            ],
        );
    }

    #[test]
    fn test_png_detect() {
        test_detect_match("files/rust-logo.png", FileType::Png);
    }

    #[test]
    fn test_jpg_detect() {
        test_detect_match("files/rust-logo.jpg", FileType::Jpg);
    }

    #[test]
    fn test_7z_detect() {
        test_detect_match("files/rust-logo.7z", FileType::_7z);
    }

    #[test]
    fn test_opus_detect() {
        test_detect_match("files/test-opus.opus", FileType::Opus);
    }

    #[test]
    fn test_vorbis_detect() {
        test_detect_match("files/test-vorbis.ogg", FileType::Vorbis);
    }

    #[test]
    fn test_mp3_detect() {
        test_detect_match("files/test-mp3.mp3", FileType::Mp3);
    }

    #[test]
    fn test_webp_detect() {
        test_detect_match("files/rust-logo.webp", FileType::Webp);
    }

    #[test]
    fn test_flac_detect() {
        test_detect_match_maybe("files/test-flac.flac", FileType::Flac);
    }

    #[test]
    fn test_wasm_detect() {
        test_detect_match("files/test-wasm.wasm", FileType::Wasm);
    }

    #[test]
    fn test_class_detect() {
        test_detect_match("files/test-class.class", FileType::Class);
    }

    #[test]
    fn so_class_detect() {
        test_detect_match("files/test-so.so", FileType::Elf);
    }

    #[test]
    fn wav_class_detect() {
        test_detect_match("files/test-wav.wav", FileType::Wav);
    }

    #[test]
    fn avi_class_detect() {
        test_detect_match("files/test-avi.avi", FileType::Avi);
    }

    #[test]
    fn aif_class_detect() {
        test_detect_match("files/test-aif.aif", FileType::Aiff);
    }

    #[test]
    fn tiff_class_detect() {
        test_detect_match("files/rust-logo.tiff", FileType::Tiff);
    }

    #[test]
    fn sqlite3_class_detect() {
        test_detect_match("files/test-db.db", FileType::Sqlite3);
    }

    // False positive - how to avoid?
    #[test]
    fn test_flac_txt_detect() {
        test_detect_match_maybe("files/test-flac.txt", FileType::Flac);
    }

    #[test]
    fn test_pdf_detect() {
        test_detect_match_maybe("files/rust-logo.pdf", FileType::Pdf);
    }

    #[test]
    fn test_mka_detect() {
        test_detect_match("files/test-mka.mka", FileType::Matroska);
    }

    #[test]
    fn test_ico_detect() {
        test_detect_match("files/rust-logo.ico", FileType::Ico);
    }

    #[test]
    fn test_tasty_detect() {
        test_detect_match("files/test-tasty.tasty", FileType::Tasty);
    }

    #[test]
    fn test_xcf_detect() {
        test_detect_match_maybe("files/rust-logo.xcf", FileType::Xcf);
    }

    #[test]
    fn test_gif_detect() {
        test_detect_match_maybe("files/rust-logo.gif", FileType::Gif);
    }

    #[test]
    fn test_bmp_detect() {
        test_detect_match_maybe("files/rust-logo.bmp", FileType::Bmp);
    }

    #[test]
    fn test_iso_detect() {
        test_detect_match("files/test-iso.iso", FileType::Iso);
    }

    #[test]
    fn test_txt_no_match() {
        test_detect_no_match("files/text");
    }

    #[test]
    fn test_tar_detect() {
        test_detect_match("files/hello.tar", FileType::Tar);
        test_detect_match("files/test.tar", FileType::Tar);
        test_detect_match("files/test-0.tar", FileType::Tar);
    }

    #[test]
    fn test_lzma_detect() {
        test_detect_match("files/test.tar.lzma", FileType::Lzma);
    }

    #[test]
    fn test_xz_detect() {
        test_detect_match("files/test-xz.xz", FileType::Xz);
    }

    #[test]
    fn test_zst_detect() {
        test_detect_match("files/ex.tar.zst", FileType::Zst);
    }

    #[test]
    fn test_gpg_detect() {
        test_detect_match("files/test-db.db.gpg", FileType::Gpg);
    }

    #[test]
    fn test_armored_gpg_detect() {
        test_detect_match("files/test-db.db.asc", FileType::ArmoredGpg);
    }

    fn test_detect<P>(path: P) -> Result<Option<FileTypeMatches>, Error>
    where
        P: AsRef<Path>,
    {
        let file = OpenOptions::new().read(true).open(path).unwrap();

        let buf = BufReader::new(file);

        detect(buf)
    }

    fn test_fast_detect<P>(path: P) -> Result<Option<FileTypeMatches>, Error>
    where
        P: AsRef<Path>,
    {
        let file = OpenOptions::new().read(true).open(path).unwrap();

        let mut buf = BufReader::new(file);

        detect_at_start_from_ref(&mut buf)
    }

    fn test_detect_sliding<P>(path: P) -> Result<Option<FileTypeMatches>, Error>
    where
        P: AsRef<Path>,
    {
        let file = OpenOptions::new().read(true).open(path).unwrap();

        let mut buf = BufReader::new(file);

        detect_at_end_from_ref(&mut buf)
    }

    fn test_detect_match<P>(path: P, file_type: FileType)
    where
        P: AsRef<Path>,
    {
        let detect = test_detect(path).map_err(|e| e.kind());
        let expected: Result<Option<FileTypeMatches>, ErrorKind> = Ok(Some(FileTypeMatches::new(
            vec![file_type],
            vec![FileTypeMatch::new(file_type, true)],
        )));

        assert_eq!(detect, expected);
    }

    fn test_detect_match_n<P>(path: P, file_type_match: Vec<FileTypeMatch>)
    where
        P: AsRef<Path>,
    {
        let detect = test_detect(path).map_err(|e| e.kind());
        assert_eq!(true, detect.is_ok());
        let detect_option = detect.unwrap();
        assert_eq!(true, detect_option.is_some());

        let types: HashSet<FileType> = file_type_match.iter().map(|v| v.file_type).collect();

        let detected = detect_option.unwrap();
        let likely_types: HashSet<FileType> = detected.likely_to_be.iter().copied().collect();

        assert_eq!(types, likely_types);

        let should_match: HashSet<FileTypeMatch> = file_type_match.iter().cloned().collect();

        let matches: HashSet<FileTypeMatch> = detected.all_matches.iter().cloned().collect();

        assert_eq!(should_match, matches);
    }

    fn test_fast_detect_match_n<P>(path: P, file_type_match: Vec<FileTypeMatch>)
    where
        P: AsRef<Path>,
    {
        let detect = test_fast_detect(path).map_err(|e| e.kind());
        assert_eq!(true, detect.is_ok());
        let detect_option = detect.unwrap();
        assert_eq!(true, detect_option.is_some());

        let types: HashSet<FileType> = file_type_match.iter().map(|v| v.file_type).collect();

        let detected = detect_option.unwrap();
        let likely_types: HashSet<FileType> = detected.likely_to_be.iter().copied().collect();

        assert_eq!(types, likely_types);

        let should_match: HashSet<FileTypeMatch> = file_type_match.iter().cloned().collect();

        let matches: HashSet<FileTypeMatch> = detected.all_matches.iter().cloned().collect();

        assert_eq!(should_match, matches);
    }

    fn test_detect_match_maybe<P>(path: P, file_type: FileType)
    where
        P: AsRef<Path>,
    {
        let detect = test_detect(path).map_err(|e| e.kind());
        let expected: Result<Option<FileTypeMatches>, ErrorKind> = Ok(Some(FileTypeMatches::new(
            vec![file_type],
            vec![FileTypeMatch::new(file_type, false)],
        )));

        assert_eq!(detect, expected);
    }

    fn test_detect_no_match<P>(path: P)
    where
        P: AsRef<Path>,
    {
        let detect = test_detect(path).map_err(|e| e.kind());
        let expected: Result<Option<FileTypeMatches>, ErrorKind> = Ok(None);

        assert_eq!(detect, expected);
    }

    #[allow(dead_code)]
    fn test_detect_match_sliding<P>(path: P, file_type: FileType)
    where
        P: AsRef<Path>,
    {
        let detect = test_detect_sliding(path).map_err(|e| e.kind());
        let expected: Result<Option<FileTypeMatches>, ErrorKind> = Ok(Some(FileTypeMatches::new(
            vec![file_type],
            vec![FileTypeMatch::new(file_type, true)],
        )));

        assert_eq!(detect, expected);
    }
}