pub struct InvoiceExtractorBuilder { /* private fields */ }Expand description
Builder for configuring InvoiceExtractor
Provides a fluent API for configuring extraction behavior. All settings have sensible defaults for immediate use.
§Defaults
- Language: None (uses default patterns)
- Confidence Threshold: 0.7 (70%)
- Use Kerning: true (stored but not yet functional - see
use_kerning()docs)
§Examples
use oxidize_pdf::text::invoice::InvoiceExtractor;
// Minimal configuration
let extractor = InvoiceExtractor::builder()
.with_language("es")
.build();
// Full configuration
let extractor = InvoiceExtractor::builder()
.with_language("de")
.confidence_threshold(0.85)
.use_kerning(false)
.build();Implementations§
Source§impl InvoiceExtractorBuilder
impl InvoiceExtractorBuilder
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new builder with default settings
Defaults:
- No language (uses English patterns)
- Confidence threshold: 0.7
- Kerning: enabled
Sourcepub fn with_language(self, lang: &str) -> Self
pub fn with_language(self, lang: &str) -> Self
Set the language for pattern matching
Accepts language codes: “es”, “en”, “de”, “it”
§Examples
use oxidize_pdf::text::invoice::InvoiceExtractor;
let extractor = InvoiceExtractor::builder()
.with_language("es") // Spanish patterns
.build();Sourcepub fn confidence_threshold(self, threshold: f64) -> Self
pub fn confidence_threshold(self, threshold: f64) -> Self
Set the minimum confidence threshold (0.0 to 1.0)
Fields below this threshold are filtered out. Higher values reduce false positives but may miss valid fields.
Recommended values:
- 0.5: Maximum recall (may include false positives)
- 0.7: Balanced (default)
- 0.9: Maximum precision (may miss valid fields)
§Examples
use oxidize_pdf::text::invoice::InvoiceExtractor;
// High precision mode
let extractor = InvoiceExtractor::builder()
.confidence_threshold(0.9)
.build();§Validation
The threshold is automatically clamped to the valid range [0.0, 1.0]. Values outside this range are silently adjusted to the nearest valid value.
Sourcepub fn use_kerning(self, enabled: bool) -> Self
pub fn use_kerning(self, enabled: bool) -> Self
Enable or disable kerning-aware text positioning (PLANNED for v2.0)
Current Behavior: This flag is stored but NOT yet used in extraction logic.
Planned Feature (v2.0): When enabled, text reconstruction will use actual font kerning pairs to calculate accurate character spacing, improving pattern matching for invoices with tight kerning (e.g., “AV”, “To”).
Why Not Implemented: Requires architectural changes to expose font metadata
in TextFragment. See struct documentation for technical details.
§Examples
use oxidize_pdf::text::invoice::InvoiceExtractor;
// Enable for future use (no effect in v1.x)
let extractor = InvoiceExtractor::builder()
.use_kerning(true) // ⚠️ Stored but not yet functional
.build();Sourcepub fn with_custom_patterns(self, patterns: PatternLibrary) -> Self
pub fn with_custom_patterns(self, patterns: PatternLibrary) -> Self
Use a custom pattern library instead of language-based defaults
Allows complete control over invoice pattern matching by providing a
custom PatternLibrary. Useful for specialized invoice formats or
combining default patterns with custom additions.
Note: When using custom patterns, the with_language() setting is ignored.
§Examples
Example 1: Use default patterns and add custom ones
use oxidize_pdf::text::invoice::{InvoiceExtractor, PatternLibrary, FieldPattern, InvoiceFieldType, Language};
// Start with Spanish defaults
let mut patterns = PatternLibrary::default_spanish();
// Add custom pattern for your specific invoice format
patterns.add_pattern(
FieldPattern::new(
InvoiceFieldType::InvoiceNumber,
r"Ref:\s*([A-Z0-9\-]+)", // Your custom format
0.85,
Some(Language::Spanish)
).unwrap()
);
let extractor = InvoiceExtractor::builder()
.with_custom_patterns(patterns)
.build();Example 2: Build completely custom pattern library
use oxidize_pdf::text::invoice::{InvoiceExtractor, PatternLibrary, FieldPattern, InvoiceFieldType, Language};
let mut patterns = PatternLibrary::new();
// Add only the patterns you need
patterns.add_pattern(
FieldPattern::new(
InvoiceFieldType::InvoiceNumber,
r"Order\s+#([0-9]+)",
0.9,
None // Language-agnostic
).unwrap()
);
let extractor = InvoiceExtractor::builder()
.with_custom_patterns(patterns)
.confidence_threshold(0.8)
.build();Sourcepub fn build(self) -> InvoiceExtractor
pub fn build(self) -> InvoiceExtractor
Build the InvoiceExtractor
Trait Implementations§
Auto Trait Implementations§
impl Freeze for InvoiceExtractorBuilder
impl RefUnwindSafe for InvoiceExtractorBuilder
impl Send for InvoiceExtractorBuilder
impl Sync for InvoiceExtractorBuilder
impl Unpin for InvoiceExtractorBuilder
impl UnwindSafe for InvoiceExtractorBuilder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<R, P> ReadPrimitive<R> for P
impl<R, P> ReadPrimitive<R> for P
Source§fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
ReadEndian::read_from_little_endian().