Module processing

Module processing 

Source
Expand description

Batch processing infrastructure for efficient file analysis

General-purpose building blocks for sequential or parallel line-oriented processing:

  • LineFileReader - Chunks files with gzip support
  • Worker - Processes batches with extraction + matching
  • LineBatch, MatchResult, LineMatch - Data structures Batch processing infrastructure for efficient file analysis

General-purpose building blocks for sequential or parallel processing:

  • DataBatch: Pre-chunked raw byte data
  • FileReader: Chunks files efficiently with gzip support
  • Worker: Processes batches with extraction + database matching
  • MatchResult: Core match info with source context

§Sequential Example

use matchy::{Database, processing};
use matchy::extractor::Extractor;
use std::sync::Arc;

let db = Database::from("threats.mxy").open()?;
let extractor = Extractor::new()?;

let mut worker = processing::Worker::builder()
    .extractor(extractor)
    .add_database("threats", Arc::new(db))
    .build();

let reader = processing::FileReader::new("access.log.gz", 128 * 1024)?;
for batch in reader.batches() {
    let batch = batch?;
    let matches = worker.process_batch(&batch)?;
    for m in matches {
        println!("{} - {}", m.source.display(), m.matched_text);
    }
}

§Parallel Example (native platforms only)

Reader Thread → [DataBatch queue] → Worker Pool → [Result queue] → Output Thread

Use process_files_parallel for multi-threaded file processing on native platforms.

Structs§

DataBatch
Pre-chunked batch of raw data ready for parallel processing
DataBatchIter
Iterator over data batches
FileReader
Reads files in chunks with compression support
MatchResult
Match result with source context
ParallelProcessingResult
Result from parallel file processing
RoutingStats
Statistics about file routing decisions made by the main thread
Worker
Worker that processes batches with extraction + database matching
WorkerBuilder
Builder for Worker with support for multiple databases
WorkerStats
Statistics from batch processing

Enums§

WorkUnit
A unit of work that can be processed independently

Functions§

process_files_parallel
Process multiple files in parallel using producer/reader/worker architecture