Expand description
Shared ETL crate for the WXYC music data pipeline.
Provides text normalization, fuzzy matching, PostgreSQL bulk loading, pipeline orchestration, and schema contracts.
Modulesยง
- cli
- Shared
clapargument groups for WXYC cache-builder CLIs. - csv_
writer - Multi-file CSV writer for ETL pipelines.
- fuzzy
- Fuzzy string matching and batch classification/resolution.
- import
- Column mapping and deduplication for CSV/TSV import.
- logger
- Sentry + structured JSON logging for WXYC ETL pipelines.
- parser
- Parsers for database dump formats.
- pg
- PostgreSQL bulk loading utilities.
- pipeline
- Three-stage parallel pipeline framework.
- schema
- Schema contracts: table names, column lists, and DDL constants.
- sqlite
- Batch-buffered SQLite writer with FTS5 full-text search support.
- state
- Pipeline state tracking for resumable ETL runs.
- text
- Text normalization, filtering, compilation detection, and artist splitting.