# prefix-register
[](https://pypi.org/project/prefix-register/)
[](LICENSE)
**Status: Beta** - API may change before 1.0 release.
A PostgreSQL-backed namespace prefix registry for [CURIE](https://www.w3.org/TR/curie/) expansion, shortening, and prefix management.
## Features
- **Async-only** - Built for high concurrency with asyncio
- **In-memory caching** - Prefixes loaded on startup for fast CURIE expansion
- **First-prefix-wins** - Each URI can only have one registered prefix
- **Batch operations** - Efficiently process multiple prefixes/URIs in a single call
- **Longest-match shortening** - Overlapping namespaces handled correctly
- **PostgreSQL backend** - Durable, scalable storage with connection pooling
- **Startup resilience** - Optional retry with exponential backoff for container orchestration
- **Input validation** - Prevents DoS via length limits (prefix max 64, URI max 2048 chars)
## Installation
```bash
pip install prefix-register
```
Requires Python 3.10+.
## Database Setup
Create the namespaces table in your PostgreSQL database:
```sql
CREATE TABLE IF NOT EXISTS namespaces (
uri TEXT PRIMARY KEY,
prefix TEXT NOT NULL UNIQUE
);
```
## Quick Start
```python
import asyncio
from prefix_register import PrefixRegistry
async def main():
# Connect to PostgreSQL (loads existing prefixes into memory)
registry = await PrefixRegistry.new(
"postgres://user:password@localhost:5432/mydb",
10 # max connections in pool
)
# Register a namespace prefix
await registry.store_prefix_if_new("foaf", "http://xmlns.com/foaf/0.1/")
# Expand a CURIE to full URI
uri = await registry.expand_curie("foaf", "Person")
print(uri) # http://xmlns.com/foaf/0.1/Person
# Shorten a URI back to a CURIE
result = await registry.shorten_uri("http://xmlns.com/foaf/0.1/Person")
if result:
prefix, local = result
print(f"{prefix}:{local}") # foaf:Person
asyncio.run(main())
```
## Examples
### Registering Namespace Prefixes
The registry uses a "first prefix wins" rule - once a URI has a prefix, subsequent attempts to register a different prefix for the same URI are ignored.
```python
# Store a single prefix - returns True if stored, False if URI already has a prefix
stored = await registry.store_prefix_if_new("foaf", "http://xmlns.com/foaf/0.1/")
if stored:
print("New prefix registered")
else:
print("URI already has a prefix")
# Batch store multiple prefixes (more efficient than individual calls)
prefixes = [
("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
("rdfs", "http://www.w3.org/2000/01/rdf-schema#"),
("owl", "http://www.w3.org/2002/07/owl#"),
("xsd", "http://www.w3.org/2001/XMLSchema#"),
("schema", "https://schema.org/"),
("dc", "http://purl.org/dc/elements/1.1/"),
("dcterms", "http://purl.org/dc/terms/"),
("skos", "http://www.w3.org/2004/02/skos/core#"),
]
result = await registry.store_prefixes_if_new(prefixes)
print(f"Stored {result['stored']} new prefixes, skipped {result['skipped']}")
```
### Expanding CURIEs to Full URIs
CURIEs (Compact URIs) like `foaf:Person` are expanded by looking up the prefix and appending the local name.
```python
# Expand a single CURIE
uri = await registry.expand_curie("foaf", "Person")
if uri:
print(uri) # http://xmlns.com/foaf/0.1/Person
else:
print("Unknown prefix")
# Expand multiple CURIEs in batch (more efficient for bulk operations)
curies = [
("foaf", "Person"),
("foaf", "name"),
("rdf", "type"),
("unknown", "Thing"), # This prefix doesn't exist
]
results = await registry.expand_curie_batch(curies)
for (prefix, local), uri in zip(curies, results):
if uri:
print(f"{prefix}:{local} -> {uri}")
else:
print(f"{prefix}:{local} -> UNKNOWN PREFIX")
# Output:
# foaf:Person -> http://xmlns.com/foaf/0.1/Person
# foaf:name -> http://xmlns.com/foaf/0.1/name
# rdf:type -> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
# unknown:Thing -> UNKNOWN PREFIX
```
### Shortening URIs to CURIEs
Convert full URIs back to compact CURIEs. The registry uses **longest-match semantics** - if multiple registered namespaces match, the longest one wins.
```python
# Shorten a single URI - returns (prefix, local_name) tuple or None
result = await registry.shorten_uri("http://xmlns.com/foaf/0.1/Person")
if result:
prefix, local_name = result
print(f"{prefix}:{local_name}") # foaf:Person
else:
print("No matching namespace found")
# Convenience method: get a formatted CURIE string, or the original URI if no match
curie = await registry.shorten_uri_or_full("http://xmlns.com/foaf/0.1/Person")
print(curie) # "foaf:Person"
curie = await registry.shorten_uri_or_full("http://unknown.example.org/Thing")
print(curie) # "http://unknown.example.org/Thing" (returned as-is)
# Batch shorten multiple URIs
uris = [
"http://xmlns.com/foaf/0.1/Person",
"http://xmlns.com/foaf/0.1/name",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://unknown.example.org/Thing", # No matching namespace
]
results = await registry.shorten_uri_batch(uris)
for uri, result in zip(uris, results):
if result:
prefix, local = result
print(f"{uri} -> {prefix}:{local}")
else:
print(f"{uri} -> NO MATCH")
# Output:
# http://xmlns.com/foaf/0.1/Person -> foaf:Person
# http://xmlns.com/foaf/0.1/name -> foaf:name
# http://www.w3.org/1999/02/22-rdf-syntax-ns#type -> rdf:type
# http://unknown.example.org/Thing -> NO MATCH
```
### Longest-Match Semantics
When multiple registered namespaces could match a URI, the longest one wins. This handles overlapping namespaces correctly.
```python
# Register two overlapping namespaces
await registry.store_prefix_if_new("ex", "http://example.org/")
await registry.store_prefix_if_new("exdata", "http://example.org/data#")
# This URI matches both namespaces, but exdata is longer
result = await registry.shorten_uri("http://example.org/data#Person")
prefix, local = result
print(f"{prefix}:{local}") # exdata:Person (NOT ex:data#Person)
# This URI only matches the shorter namespace
result = await registry.shorten_uri("http://example.org/other/Thing")
prefix, local = result
print(f"{prefix}:{local}") # ex:other/Thing
```
### Looking Up Prefixes and URIs
```python
# Get the URI for a known prefix
uri = await registry.get_uri_for_prefix("foaf")
if uri:
print(f"foaf -> {uri}") # foaf -> http://xmlns.com/foaf/0.1/
# Get the prefix for a known URI
prefix = await registry.get_prefix_for_uri("http://xmlns.com/foaf/0.1/")
if prefix:
print(f"http://xmlns.com/foaf/0.1/ -> {prefix}") # -> foaf
# Get all registered prefixes
all_prefixes = await registry.get_all_prefixes()
for prefix, uri in all_prefixes.items():
print(f"{prefix}: {uri}")
# Get count of registered prefixes
count = await registry.prefix_count()
print(f"Total prefixes: {count}")
```
### Connection with Retry (for Container Orchestration)
When running in Kubernetes or Docker Compose, your database might not be ready when your app starts. Use `new_with_retry` for automatic reconnection with exponential backoff.
```python
# Wait for database to become available (useful in container startup)
registry = await PrefixRegistry.new_with_retry(
"postgres://user:password@db:5432/mydb",
max_connections=10,
max_retries=5, # Try up to 5 times
initial_delay_ms=1000, # Start with 1 second delay
max_delay_ms=30000 # Cap delay at 30 seconds
)
# Delays: 1s -> 2s -> 4s -> 8s -> 16s (capped at 30s)
```
### Real-World Example: Processing RDF Data
```python
import asyncio
from prefix_register import PrefixRegistry
async def process_rdf_triples(registry, triples):
"""Convert full URIs in triples to CURIEs for display."""
results = []
# Collect all URIs that need shortening
all_uris = []
for s, p, o in triples:
all_uris.extend([s, p, o] if isinstance(o, str) and o.startswith("http") else [s, p])
# Batch shorten for efficiency
shortened = await registry.shorten_uri_batch(all_uris)
uri_to_curie = {}
for uri, result in zip(all_uris, shortened):
if result:
prefix, local = result
uri_to_curie[uri] = f"{prefix}:{local}"
else:
uri_to_curie[uri] = uri # Keep original if no match
# Format triples with CURIEs
for s, p, o in triples:
s_short = uri_to_curie.get(s, s)
p_short = uri_to_curie.get(p, p)
o_short = uri_to_curie.get(o, o) if isinstance(o, str) else repr(o)
results.append(f"{s_short} {p_short} {o_short}")
return results
async def main():
registry = await PrefixRegistry.new("postgres://localhost/mydb", 10)
# Register common prefixes
await registry.store_prefixes_if_new([
("foaf", "http://xmlns.com/foaf/0.1/"),
("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
])
# Sample RDF triples (as full URIs)
triples = [
("http://example.org/john", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "http://xmlns.com/foaf/0.1/Person"),
("http://example.org/john", "http://xmlns.com/foaf/0.1/name", "John Doe"),
]
formatted = await process_rdf_triples(registry, triples)
for line in formatted:
print(line)
# Output:
# http://example.org/john rdf:type foaf:Person
# http://example.org/john foaf:name 'John Doe'
asyncio.run(main())
```
## API Reference
### PrefixRegistry
| `new(database_url, max_connections)` | Connect to PostgreSQL and load existing prefixes |
| `new_with_retry(...)` | Connect with retry logic for transient failures |
| `store_prefix_if_new(prefix, uri)` | Store a prefix if URI doesn't have one (returns `bool`) |
| `store_prefixes_if_new(prefixes)` | Batch store prefixes (returns `{"stored": n, "skipped": m}`) |
| `get_uri_for_prefix(prefix)` | Get URI for a prefix, or `None` |
| `get_prefix_for_uri(uri)` | Get prefix for a URI, or `None` |
| `expand_curie(prefix, local_name)` | Expand CURIE to full URI, or `None` if unknown |
| `expand_curie_batch(curies)` | Batch expand (list of `str` or `None`) |
| `shorten_uri(uri)` | Shorten to `(prefix, local)` tuple, or `None` |
| `shorten_uri_or_full(uri)` | Shorten to `"prefix:local"` string, or return original URI |
| `shorten_uri_batch(uris)` | Batch shorten (list of tuples or `None`) |
| `get_all_prefixes()` | Get all prefixes as `{prefix: uri}` dict |
| `prefix_count()` | Get number of registered prefixes |
## Use Cases
- **CURIE expansion** in RDF/SPARQL processing
- **Namespace management** for semantic web applications
- **Prefix discovery** from Turtle, JSON-LD, RDF/XML documents
- **URI shortening** for human-readable output
## License
Apache-2.0