Matchy
Fast database for IP address and pattern matching with rich data storage.
Match IP addresses, CIDR ranges, and thousands of glob patterns in microseconds. Perfect for threat intelligence, GeoIP, domain categorization, and network security applications.
Why Matchy?
Traditional IP/domain lookups fall apart at scale:
- π Sequential pattern matching: 10,000 patterns = 10,000Γ slower
- π Hash tables only do exact matchesβno wildcards for domains like
*.malicious.com - πΎ Loading databases takes hundreds of milliseconds
- π Running 50 worker processes means loading the same data 50 times
Matchy solves these problems with a unified database that supports both IP addresses and patterns.
Real-World Use Cases
π‘οΈ Threat Intelligence
Query malicious IPs (1.2.3.4), suspicious domains (*.phishing-site.com), and URL patterns (http://*/admin/config.php) from a single database. Check every user interaction against 50,000+ threat indicators in ~20 microseconds.
// Check both IPs and domains with one database
db.lookup?; // IP lookup
db.lookup?; // Pattern match
π GeoIP with Custom Data
Drop-in replacement for MaxMind GeoIP databases with custom metadata support. Query IP addresses and get rich JSON-like structured data:
match db.lookup?
π Multi-Process Memory Efficiency
Run 64 worker processes with the same 100MB database? Matchy uses memory mapping so the OS automatically shares pages. 99% memory savings: 64 processes = 100MB RAM, not 6.4GB.
π₯ Instant Loading
Memory-mapped databases load in <100 microseconds regardless of size. No deserialization overheadβdirect access to on-disk structures.
Key Features
Unified Database
- IP addresses & CIDR ranges: Binary search tree for O(log n) lookups
- Glob patterns: Aho-Corasick automaton for O(n) matching
- Auto-detection: One query function handles both types
- Rich data: Store JSON-like structured data with each entry
Performance
- 1M+ queries/second even with 50,000 patterns
- <100ΞΌs loading time via memory mapping
- Zero-copy: Direct access to on-disk structures
- Memory sharing: Automatic OS-level page sharing across processes
Compatibility
- libmaxminddb compatible (planned): Drop-in replacement for existing GeoIP code
- C/C++/Rust APIs: Stable FFI for any language
- MMDB format: Compatible with MaxMind database tools
Quick Start
Rust API
use ;
use HashMap;
// Build a database with both IP and pattern entries
let mut builder = new;
// Add IP address with data
let mut ip_data = new;
ip_data.insert;
ip_data.insert;
builder.add_entry?;
// Add CIDR range
let mut cidr_data = new;
cidr_data.insert;
builder.add_entry?;
// Add pattern with data
let mut pattern_data = new;
pattern_data.insert;
pattern_data.insert;
builder.add_entry?;
// Build and save
let database_bytes = builder.build?;
write?;
// Query the database (auto-detects IP vs pattern)
let db = open?;
// IP lookup
match db.lookup?
// Pattern matching
match db.lookup?
C API
int
Performance
Measured on M4 MacBook Air:
| Workload | Throughput | Notes |
|---|---|---|
| IP lookups | 1.5M queries/sec | Binary tree search |
| Pattern matching (10K patterns) | 1.4M queries/sec | Aho-Corasick |
| Pattern matching (50K patterns) | 1M queries/sec | Extreme scale |
| Database load time | <150ΞΌs | Memory-mapped |
| Build time (1K entries) | ~4ms | One-time cost |
See DEVELOPMENT.md for detailed benchmarks.
Architecture
flowchart TD
App["Application<br/>(C, C++, or Rust)"] --> CPP["C++ Wrapper<br/>(RAII)"] & CAPI["C API<br/>(matchy_*)"] & Rust["Rust API"]
CPP --> Core
CAPI --> Core
Rust --> Core
Core["Rust Core"] --> IPTree["IP Search Tree<br/>(Binary Trie)"]
Core --> AC["Aho-Corasick<br/>(Pattern Matching)"]
Core --> Data["Data Section<br/>(MMDB Format)"]
Core --> Mmap["Memory Mapping"]
style App fill:#e1f5ff
style Core fill:#fff3e0
style IPTree fill:#f5f5f5
style AC fill:#f5f5f5
style Data fill:#f5f5f5
style Mmap fill:#f5f5f5
Hybrid approach: IP addresses use a binary search tree for O(log n) lookups. Patterns use Aho-Corasick for O(n) simultaneous matching. Both share the same data section with automatic deduplication.
Building
Requirements:
- Rust 1.70+ (stable toolchain)
- C compiler (for C API consumers)
- cbindgen (installed automatically as build dependency)
# Build optimized library
# Run test suite
# Run benchmarks
# Generate API documentation
The build process automatically generates include/matchy.h for C/C++ integration.
Build artifacts:
target/release/libmatchy.dylib(macOS)target/release/libmatchy.so(Linux)target/release/libmatchy.a(static library)include/matchy.h(C header, auto-generated)
API Reference
C API Functions
Builder API:
matchy_builder_t* matchy_builder_new()- Create database builderint matchy_builder_add(builder, key, json_data)- Add IP/CIDR/pattern with JSON dataint matchy_builder_set_description(builder, desc)- Set metadataint matchy_builder_save(builder, filename)- Build and save to fileint matchy_builder_build(builder, &buffer, &size)- Build to memoryvoid matchy_builder_free(builder)- Free builder
Query API:
matchy_t* matchy_open(filename)- Open database (memory-mapped)matchy_t* matchy_open_buffer(buffer, size)- Open from memory buffervoid matchy_close(db)- Close databasematchy_result_t matchy_query(db, query)- Unified query (auto-detects IP vs pattern)void matchy_free_result(&result)- Free query resultconst char* matchy_version()- Get library version
Error Codes:
MATCHY_SUCCESS(0) - SuccessMATCHY_ERROR_FILE_NOT_FOUND(-1) - File not foundMATCHY_ERROR_INVALID_FORMAT(-2) - Invalid database formatMATCHY_ERROR_INVALID_PARAM(-5) - Invalid parameterMATCHY_ERROR_IO(-6) - I/O error
Rust API
Core Types:
Database- Unified database for queriesDatabaseBuilder- Build databases (alias forMmdbBuilder)QueryResult- Result enum (IP/Pattern/NotFound)DataValue- Rich data type (String/Int/Map/Array/etc.)
See API documentation for complete reference.
Linking
# C programs
# C++ programs
# Add to rpath (macOS)
# Add to rpath (Linux)
Database Format
Matchy uses a hybrid binary format:
ββββββββββββββββββββββββββββββββββββββββ
β IP Search Tree (binary trie) β β Fast IP lookups
ββββββββββββββββββββββββββββββββββββββββ€
β Data Section (MMDB-compatible) β β Shared rich data
ββββββββββββββββββββββββββββββββββββββββ€
β Pattern Matcher (Aho-Corasick) β β Fast pattern matching
ββββββββββββββββββββββββββββββββββββββββ€
β Metadata β β Database info
ββββββββββββββββββββββββββββββββββββββββ
All structures use file offsets (not pointers) for:
- Direct memory mapping without deserialization
- Cross-process page sharing via shared memory
- Safety validation before dereferencing
Pattern Syntax
Supported glob patterns:
*- Match zero or more characters?- Match exactly one character[abc]- Match any character in set[a-z]- Match any character in range[!abc]- Match any character not in set
Examples:
*.evil.com- Matcheswww.evil.com,malware.evil.comtest_*.log- Matchestest_001.log,test_debug.loghttp://*/admin/*- Matches any URL with/admin/path
Documentation
- API_REDESIGN.md - Complete API specification
- DEVELOPMENT.md - Architecture and implementation details
- examples/ - Example programs
- API docs:
cargo doc --no-deps --open
Testing
Contributing
Contributions welcome! Please:
- Run
cargo fmtandcargo clippybefore submitting - Ensure all tests pass with
cargo test - Add tests for new features
- Update documentation
Roadmap
- libmaxminddb compatibility layer (drop-in replacement)
- C++ RAII wrapper for modern C++
- Python bindings
- Streaming database updates (append-only)
- Compression support
License
BSD-2-Clause
Acknowledgments
Built on the Paraglob pattern matching algorithm with extensions for IP address lookups and rich data storage.