prefix-register 0.2.2

A PostgreSQL-backed namespace prefix registry for CURIE expansion and prefix management
Documentation
# prefix-register

[![Crates.io](https://img.shields.io/crates/v/prefix-register.svg)](https://crates.io/crates/prefix-register)
[![PyPI](https://img.shields.io/pypi/v/prefix-register.svg)](https://pypi.org/project/prefix-register/)
[![Documentation](https://docs.rs/prefix-register/badge.svg)](https://docs.rs/prefix-register)
[![License](https://img.shields.io/crates/l/prefix-register.svg)](LICENSE)

**Status: Beta** - API may change before 1.0 release.

A PostgreSQL-backed namespace prefix registry for [CURIE](https://www.w3.org/TR/curie/) expansion and prefix management. Available for both **Rust** and **Python**.

## Features

- **Dual language support** - Native Rust library with Python bindings via PyO3
- **Async-only** - Built on tokio for high concurrency
- **In-memory caching** - Prefixes loaded on startup for fast CURIE expansion
- **First-prefix-wins** - Each URI can only have one registered prefix
- **Batch operations** - Efficiently store multiple prefixes in a single round trip
- **PostgreSQL backend** - Durable, scalable storage with connection pooling
- **Startup resilience** - Optional retry with exponential backoff for container orchestration
- **Input validation** - Prevents DoS via length limits (prefix max 64, URI max 2048 chars)
- **Tracing instrumentation** - Built-in spans and events for observability (configure subscriber in your app)

## Use Cases

- CURIE expansion in RDF processing
- Namespace prefix management for semantic web applications
- Prefix discovery from Turtle, JSON-LD, XML documents

## Installation

### Rust

Add to your `Cargo.toml`:

```toml
[dependencies]
prefix-register = "0.1"
tokio = { version = "1", features = ["rt-multi-thread"] }
```

### Python

```bash
pip install prefix-register
```

Requires Python 3.10+.

## Database Setup

Create the namespaces table in your PostgreSQL database:

```sql
CREATE TABLE IF NOT EXISTS namespaces (
    uri TEXT PRIMARY KEY,
    prefix TEXT NOT NULL UNIQUE
);
```

## Usage

### Rust

```rust
use prefix_register::PrefixRegistry;

#[tokio::main]
async fn main() -> prefix_register::Result<()> {
    // Connect to PostgreSQL
    let registry = PrefixRegistry::new(
        "postgres://localhost/mydb",
        10,  // max connections
    ).await?;

    // Store a prefix (only if URI doesn't already have one)
    let stored = registry.store_prefix_if_new(
        "foaf",
        "http://xmlns.com/foaf/0.1/"
    ).await?;
    println!("Prefix stored: {}", stored);

    // Expand a CURIE
    if let Some(uri) = registry.expand_curie("foaf", "Person").await? {
        println!("foaf:Person = {}", uri);
        // Output: foaf:Person = http://xmlns.com/foaf/0.1/Person
    }

    // Batch store prefixes - returns detailed result for logging
    let prefixes = vec![
        ("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
        ("rdfs", "http://www.w3.org/2000/01/rdf-schema#"),
        ("schema", "https://schema.org/"),
    ];
    let result = registry.store_prefixes_if_new(prefixes).await?;
    println!("Stored {}, skipped {}", result.stored, result.skipped);

    // Shorten a URI (longest-match wins)
    if let Some((prefix, local)) = registry.shorten_uri("http://xmlns.com/foaf/0.1/Person").await? {
        println!("{}:{}", prefix, local);
        // Output: foaf:Person
    }

    // Or get a formatted string (returns original URI if no match)
    let curie = registry.shorten_uri_or_full("http://xmlns.com/foaf/0.1/Person").await?;
    println!("{}", curie);  // Output: foaf:Person

    Ok(())
}
```

### Python

```python
import asyncio
from prefix_register import PrefixRegistry

async def main():
    # Connect to PostgreSQL
    registry = await PrefixRegistry.new(
        "postgres://localhost/mydb",
        10  # max connections
    )

    # Store a prefix (only if URI doesn't already have one)
    stored = await registry.store_prefix_if_new(
        "foaf",
        "http://xmlns.com/foaf/0.1/"
    )
    print(f"Prefix stored: {stored}")

    # Expand a CURIE
    uri = await registry.expand_curie("foaf", "Person")
    if uri:
        print(f"foaf:Person = {uri}")
        # Output: foaf:Person = http://xmlns.com/foaf/0.1/Person

    # Batch store prefixes
    prefixes = [
        ("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#"),
        ("rdfs", "http://www.w3.org/2000/01/rdf-schema#"),
        ("schema", "https://schema.org/"),
    ]
    result = await registry.store_prefixes_if_new(prefixes)
    print(f"Stored {result['stored']}, skipped {result['skipped']}")

    # Shorten a URI (longest-match wins)
    result = await registry.shorten_uri("http://xmlns.com/foaf/0.1/Person")
    if result:
        prefix, local = result
        print(f"{prefix}:{local}")  # Output: foaf:Person

    # Or get a formatted string (returns original URI if no match)
    curie = await registry.shorten_uri_or_full("http://xmlns.com/foaf/0.1/Person")
    print(curie)  # Output: foaf:Person

asyncio.run(main())
```

## API

### PrefixRegistry

| Method | Description |
|--------|-------------|
| `new(database_url, max_connections)` | Connect to PostgreSQL and load existing prefixes |
| `new_with_retry(...)` | Connect with retry logic for transient failures |
| `get_uri_for_prefix(prefix)` | Get the URI for a prefix (cache-first) |
| `get_prefix_for_uri(uri)` | Get the prefix for a URI |
| `store_prefix_if_new(prefix, uri)` | Store a prefix if the URI doesn't have one (returns `bool`) |
| `store_prefixes_if_new(prefixes)` | Batch store prefixes |
| `expand_curie(prefix, local_name)` | Expand a CURIE to a full URI |
| `expand_curie_batch(curies)` | Batch expand CURIEs (returns `None` for unknown prefixes) |
| `shorten_uri(uri)` | Shorten a URI to `(prefix, local_name)` using longest-match |
| `shorten_uri_or_full(uri)` | Shorten to `"prefix:local"` or return original URI |
| `shorten_uri_batch(uris)` | Batch shorten URIs (returns `None` for unmatched) |
| `get_all_prefixes()` | Get all registered prefix mappings |
| `prefix_count()` | Get the number of registered prefixes |

### RetryConfig (Rust) / new_with_retry parameters (Python)

Configuration for startup retry behaviour:

**Rust:**
- `RetryConfig::default()` - 5 retries, 1s initial delay, 30s max delay
- `RetryConfig::none()` - No retries (fail immediately)
- `RetryConfig::new(max_retries, initial_delay, max_delay)` - Custom configuration

**Python:**
```python
registry = await PrefixRegistry.new_with_retry(
    "postgres://localhost/mydb",
    10,                    # max_connections
    max_retries=5,         # default: 5
    initial_delay_ms=1000, # default: 1000
    max_delay_ms=30000     # default: 30000
)
```

### BatchStoreResult

Returned by `store_prefixes_if_new()` for detailed reporting:

**Rust:**
- `stored: usize` - Number of new prefixes stored
- `skipped: usize` - Number of prefixes skipped (URI already had a prefix)
- `total()` - Total prefixes processed
- `all_stored()` - Returns true if all were stored
- `none_stored()` - Returns true if none were stored

**Python:**
```python
result = await registry.store_prefixes_if_new(prefixes)
print(result["stored"])   # Number of new prefixes stored
print(result["skipped"])  # Number skipped (URI already had a prefix)
```

## License

Apache-2.0