Skip to main content

Crate cortexai_data

Crate cortexai_data 

Source
Expand description

§Data Matching Module

High-performance data matching, CPF/CNPJ validation, and cross-source consolidation for Brazilian data sources.

§Features

  • CPF Matcher: Normalize and validate Brazilian CPF numbers
  • CNPJ Matcher: Normalize and validate Brazilian CNPJ numbers
  • Name Matcher: Fuzzy matching with Brazilian name conventions
  • Data Matcher: Cross-source entity resolution and consolidation
  • Data Pipeline: Async processing with LRU caching
  • Parallel Pipeline: High-throughput concurrent processing with DashMap
  • Metrics: Comprehensive observability with EMA processing times
  • SQL Extractor: PostgreSQL data extraction with dynamic schema (requires postgres feature)

§Example

use cortexai_data::{DataMatcher, CpfMatcher, CnpjMatcher, NameMatcher};

let matcher = DataMatcher::new();
let results = matcher.match_across_sources(&sources, "Lucas Oliveira", Some("123.456.789-00"));

§SQL Extraction (with postgres feature)

use cortexai_data::sql::{PostgresExtractor, PostgresConfig};

let config = PostgresConfig::new("postgres://user:pass@localhost/db");
let pool = config.create_pool().await?;

let extractor = PostgresExtractor::new("sales", "Recent Sales", pool)
    .with_query("SELECT * FROM sales WHERE created_at > NOW() - INTERVAL '30 days'");

let data_source = extractor.extract().await?;

Re-exports§

pub use cnpj::CnpjMatcher;
pub use cpf::CpfMatcher;
pub use crossref::build_cross_reference_narrative;
pub use crossref::CrossReferenceResult;
pub use crossref::CrossReferencer;
pub use crossref::SourceSummary;
pub use matcher::DataMatcher;
pub use metrics::DataMatchingMetrics;
pub use metrics::MetricsSnapshot;
pub use name::NameMatcher;
pub use pipeline::CacheResult;
pub use pipeline::ConcurrentCache;
pub use pipeline::DataCache;
pub use pipeline::DataPipeline;
pub use pipeline::ParallelPipeline;
pub use types::*;

Modules§

cnpj
CNPJ (Cadastro Nacional da Pessoa Jurídica) matcher
cpf
CPF (Cadastro de Pessoa Física) matching and validation
crossref
Cross-reference narrative generation
matcher
Cross-source data matching and entity resolution
metrics
Data matching metrics and observability
name
Name matching with Brazilian conventions
pipeline
Data pipeline with LRU caching
types
Core types for data matching