Heisenberg
Location enrichment library for converting unstructured location data into structured administrative hierarchies.
Heisenberg transforms incomplete location data into complete administrative hierarchies using the GeoNames dataset. It resolves ambiguous place names, fills missing administrative context, and handles alternative names across 11+ million global locations.
Features
- Embedded dataset: Ships with data included, no downloads required
- Fast full-text search with Tantivy indexing
- Complete administrative hierarchy resolution (country → state → county → place)
- Multiple data sources (cities15000, cities5000, etc.) with smart fallback
- Batch processing for high-throughput applications
- Python and Rust APIs
- Configurable search behavior and scoring
- Alternative name resolution (e.g., "Deutschland" → "Germany")
Quick Start
Python
# Create searcher instance
=
# Simple search
=
# Multi-term search (largest to smallest: Country, City)
=
# Resolve complete administrative hierarchy (largest to smallest: State, City)
=
= .
# United States
# California
# San Francisco County
# San Francisco
Rust
[]
= "0.1"
use ;
// Create searcher using embedded data (fastest, no downloads)
let searcher = new_embedded?;
// Or use specific data source with smart fallback
let searcher = initialize?;
// Simple search
let results = searcher.search?;
println!;
// Resolve complete hierarchy (largest to smallest: Country, City)
let resolved = searcher.resolve_location?;
let context = &resolved.context;
if let Some = &context.admin0
if let Some = &context.place
Examples
The problem: inconsistent and incomplete location data.
| Input (largest → smallest) | Output |
|---|---|
"Florida" |
United States → Florida |
["France", "Paris"] |
France → Île-de-France → Paris |
["CA", "San Francisco"] |
United States → California → San Francisco County → San Francisco |
"Deutschland" |
Germany (resolves alternative names) |
Administrative Levels
- Admin0: Countries
- Admin1: States/Provinces
- Admin2: Counties/Regions
- Admin3: Local administrative divisions
- Admin4: Sub-local administrative divisions
- Places: Cities, towns, landmarks
Usage Examples
Batch Processing
# Note: Input order is largest to smallest (Country, City)
=
=
Configuration
# Fast search (fewer results, optimized for speed)
=
=
# Comprehensive search (more results, higher accuracy)
=
=
See examples/ for complete Rust examples and python/examples/ for Python examples.
Installation
Python
Rust
[]
= "0.1"
Data
Embedded by Default: Heisenberg ships with the Cities15000 dataset embedded (~25MB compressed), providing instant startup with no downloads required.
Multiple Data Sources: Choose from different datasets based on your needs:
Cities15000: Cities with population > 15,000 (default, embedded)Cities5000: Cities with population > 5,000Cities1000: Cities with population > 1,000Cities500: Cities with population > 500AllCountries: Complete GeoNames dataset (~1GB)
Smart Fallback: When requesting non-embedded datasets, Heisenberg automatically downloads and processes data on first use, then caches locally.
Development:
# Use embedded test data for development
USE_TEST_DATA=true
# Force regeneration of embedded data at build time
GENERATE_EMBEDDED_DATA=1
# Use specific data source
EMBEDDED_DATA_SOURCE=cities5000
Performance
- Instant startup: Using embedded data (no download/processing time)
- Search: ~1ms per query
- Batch processing: 10-100x faster than individual queries
- Memory: ~200MB RAM
- Storage: ~25MB embedded + indexes, or ~1GB for larger datasets
License
MIT License - see LICENSE for details.