Expand description
§Data Matching Module
High-performance data matching, CPF/CNPJ validation, and cross-source consolidation for Brazilian data sources.
§Features
- CPF Matcher: Normalize and validate Brazilian CPF numbers
- CNPJ Matcher: Normalize and validate Brazilian CNPJ numbers
- Name Matcher: Fuzzy matching with Brazilian name conventions
- Data Matcher: Cross-source entity resolution and consolidation
- Data Pipeline: Async processing with LRU caching
- Parallel Pipeline: High-throughput concurrent processing with DashMap
- Metrics: Comprehensive observability with EMA processing times
- SQL Extractor: PostgreSQL data extraction with dynamic schema (requires
postgresfeature)
§Example
ⓘ
use cortexai_data::{DataMatcher, CpfMatcher, CnpjMatcher, NameMatcher};
let matcher = DataMatcher::new();
let results = matcher.match_across_sources(&sources, "Lucas Oliveira", Some("123.456.789-00"));§SQL Extraction (with postgres feature)
ⓘ
use cortexai_data::sql::{PostgresExtractor, PostgresConfig};
let config = PostgresConfig::new("postgres://user:pass@localhost/db");
let pool = config.create_pool().await?;
let extractor = PostgresExtractor::new("sales", "Recent Sales", pool)
.with_query("SELECT * FROM sales WHERE created_at > NOW() - INTERVAL '30 days'");
let data_source = extractor.extract().await?;Re-exports§
pub use cnpj::CnpjMatcher;pub use cpf::CpfMatcher;pub use crossref::build_cross_reference_narrative;pub use crossref::CrossReferenceResult;pub use crossref::CrossReferencer;pub use crossref::SourceSummary;pub use matcher::DataMatcher;pub use metrics::DataMatchingMetrics;pub use metrics::MetricsSnapshot;pub use name::NameMatcher;pub use pipeline::CacheResult;pub use pipeline::ConcurrentCache;pub use pipeline::DataCache;pub use pipeline::DataPipeline;pub use pipeline::ParallelPipeline;pub use types::*;
Modules§
- cnpj
- CNPJ (Cadastro Nacional da Pessoa Jurídica) matcher
- cpf
- CPF (Cadastro de Pessoa Física) matching and validation
- crossref
- Cross-reference narrative generation
- matcher
- Cross-source data matching and entity resolution
- metrics
- Data matching metrics and observability
- name
- Name matching with Brazilian conventions
- pipeline
- Data pipeline with LRU caching
- types
- Core types for data matching