Skip to main content

Crate wxyc_etl

Crate wxyc_etl 

Source
Expand description

Shared ETL crate for the WXYC music data pipeline.

Provides text normalization, fuzzy matching, PostgreSQL bulk loading, pipeline orchestration, and schema contracts.

Modulesยง

cli
Shared clap argument groups for WXYC cache-builder CLIs.
csv_writer
Multi-file CSV writer for ETL pipelines.
fuzzy
Fuzzy string matching and batch classification/resolution.
import
Column mapping and deduplication for CSV/TSV import.
logger
Sentry + structured JSON logging for WXYC ETL pipelines.
parser
Parsers for database dump formats.
pg
PostgreSQL bulk loading utilities.
pipeline
Three-stage parallel pipeline framework.
schema
Schema contracts: table names, column lists, and DDL constants.
sqlite
Batch-buffered SQLite writer with FTS5 full-text search support.
state
Pipeline state tracking for resumable ETL runs.
text
Text normalization, filtering, compilation detection, and artist splitting.