Expand description
Batch processing infrastructure for efficient file analysis
General-purpose building blocks for sequential or parallel line-oriented processing:
LineFileReader- Chunks files with gzip supportWorker- Processes batches with extraction + matchingLineBatch,MatchResult,LineMatch- Data structures Batch processing infrastructure for efficient file analysis
General-purpose building blocks for sequential or parallel processing:
- DataBatch: Pre-chunked raw byte data
- FileReader: Chunks files efficiently with gzip support
- Worker: Processes batches with extraction + database matching
- MatchResult: Core match info with source context
§Sequential Example
use matchy::{Database, processing};
use matchy::extractor::Extractor;
use std::sync::Arc;
let db = Database::from("threats.mxy").open()?;
let extractor = Extractor::new()?;
let mut worker = processing::Worker::builder()
.extractor(extractor)
.add_database("threats", Arc::new(db))
.build();
let reader = processing::FileReader::new("access.log.gz", 128 * 1024)?;
for batch in reader.batches() {
let batch = batch?;
let matches = worker.process_batch(&batch)?;
for m in matches {
println!("{} - {}", m.source.display(), m.matched_text);
}
}§Parallel Example (native platforms only)
Reader Thread → [DataBatch queue] → Worker Pool → [Result queue] → Output ThreadUse process_files_parallel for multi-threaded file processing on native platforms.
Structs§
- Data
Batch - Pre-chunked batch of raw data ready for parallel processing
- Data
Batch Iter - Iterator over data batches
- File
Reader - Reads files in chunks with compression support
- Match
Result - Match result with source context
- Parallel
Processing Result - Result from parallel file processing
- Routing
Stats - Statistics about file routing decisions made by the main thread
- Worker
- Worker that processes batches with extraction + database matching
- Worker
Builder - Builder for
Workerwith support for multiple databases - Worker
Stats - Statistics from batch processing
Enums§
- Work
Unit - A unit of work that can be processed independently
Functions§
- process_
files_ parallel - Process multiple files in parallel using producer/reader/worker architecture