Advanced Usage
==============
This guide covers advanced features and performance optimization techniques for ProllyTree.
Performance Optimization
-------------------------
Batch Operations
~~~~~~~~~~~~~~~~
For better performance when inserting many items, use batch operations:
.. code-block:: python
from prollytree import ProllyTree
tree = ProllyTree()
# Instead of individual inserts
for i in range(1000):
tree.insert(f"key_{i}".encode(), f"value_{i}".encode())
# Use batch insert (much faster)
batch_data = [
(f"key_{i}".encode(), f"value_{i}".encode())
for i in range(1000)
]
tree.insert_batch(batch_data)
Storage Backends
~~~~~~~~~~~~~~~~
Choose the appropriate storage backend for your use case:
.. code-block:: python
from prollytree import ProllyTree, VersionedKvStore
# In-memory (fastest, not persistent)
tree = ProllyTree()
# File-based storage (persistent)
tree = ProllyTree(storage_type="file", path="/path/to/data")
# Versioned storage with Git-like history
store = VersionedKvStore("/path/to/versioned_data")
Tree Configuration
~~~~~~~~~~~~~~~~~~
Tune tree parameters for your workload:
.. code-block:: python
from prollytree import ProllyTree, TreeConfig
# Default configuration
config = TreeConfig()
# Custom configuration for specific workloads
config = TreeConfig(
base=8, # Higher base for wider trees (good for read-heavy)
modulus=128, # Higher modulus for deeper trees (good for write-heavy)
)
tree = ProllyTree(config=config)
Concurrent Access
-----------------
Thread Safety
~~~~~~~~~~~~~
For multi-threaded applications:
.. code-block:: python
import threading
from prollytree import ProllyTree
# Create a thread-safe tree
tree = ProllyTree(thread_safe=True)
def worker(thread_id):
for i in range(100):
key = f"thread_{thread_id}_key_{i}".encode()
value = f"value_{i}".encode()
tree.insert(key, value)
# Start multiple threads
threads = []
for i in range(4):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
for t in threads:
t.join()
Memory Management
-----------------
LRU Cache
~~~~~~~~~
Enable LRU caching for read-heavy workloads:
.. code-block:: python
from prollytree import ProllyTree, CacheConfig
cache_config = CacheConfig(
max_size=10000, # Cache up to 10k nodes
eviction_policy="lru"
)
tree = ProllyTree(cache_config=cache_config)
Memory Monitoring
~~~~~~~~~~~~~~~~~
Monitor memory usage:
.. code-block:: python
tree = ProllyTree()
# Insert data
for i in range(10000):
tree.insert(f"key_{i}".encode(), f"value_{i}".encode())
# Get memory statistics
stats = tree.get_memory_stats()
print(f"Nodes in memory: {stats['node_count']}")
print(f"Memory usage: {stats['memory_bytes']} bytes")
print(f"Cache hit rate: {stats['cache_hit_rate']}%")
Data Serialization
-------------------
Custom Serialization
~~~~~~~~~~~~~~~~~~~~~
For complex data types:
.. code-block:: python
import json
import pickle
from prollytree import ProllyTree
tree = ProllyTree()
# JSON serialization
def store_json(tree, key, data):
serialized = json.dumps(data).encode('utf-8')
tree.insert(key.encode('utf-8'), serialized)
def load_json(tree, key):
data = tree.find(key.encode('utf-8'))
return json.loads(data.decode('utf-8')) if data else None
# Usage
complex_data = {
"user": "alice",
"scores": [95, 87, 92],
"metadata": {"premium": True, "last_login": "2023-01-01"}
}
store_json(tree, "user:alice", complex_data)
retrieved = load_json(tree, "user:alice")
SQL Advanced Queries
---------------------
Complex Joins and Aggregations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from prollytree import ProllySQLStore
sql_store = ProllySQLStore("/path/to/sql_data")
# Create tables
sql_store.execute("""
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT,
department_id INTEGER,
salary REAL
)
""")
sql_store.execute("""
CREATE TABLE departments (
id INTEGER PRIMARY KEY,
name TEXT,
budget REAL
)
""")
# Complex aggregation query
result = sql_store.execute("""
SELECT
d.name as department,
COUNT(u.id) as employee_count,
AVG(u.salary) as avg_salary,
MAX(u.salary) as max_salary,
SUM(u.salary) as total_salary
FROM departments d
LEFT JOIN users u ON d.id = u.department_id
GROUP BY d.id, d.name
HAVING COUNT(u.id) > 0
ORDER BY avg_salary DESC
""")
Error Handling and Debugging
-----------------------------
Exception Handling
~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from prollytree import ProllyTree, ProllyTreeError, StorageError
try:
tree = ProllyTree(storage_type="file", path="/invalid/path")
tree.insert(b"key", b"value")
except StorageError as e:
print(f"Storage error: {e}")
except ProllyTreeError as e:
print(f"Tree operation error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Debug Mode
~~~~~~~~~~
.. code-block:: python
# Enable debug logging
tree = ProllyTree(debug=True, log_level="DEBUG")
# Validate tree structure
is_valid = tree.validate()
if not is_valid:
print("Tree structure is corrupted!")
# Get detailed statistics
stats = tree.get_debug_stats()
print(f"Tree height: {stats['height']}")
print(f"Node distribution: {stats['node_distribution']}")
print(f"Rebalancing events: {stats['rebalance_count']}")
Migration and Backup
---------------------
Data Export/Import
~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Export tree data
tree.export_to_file("/path/to/backup.json", format="json")
tree.export_to_file("/path/to/backup.bin", format="binary")
# Import tree data
new_tree = ProllyTree()
new_tree.import_from_file("/path/to/backup.json", format="json")
This advanced guide covers performance optimization, concurrent access patterns, memory management, complex data operations, and debugging techniques for ProllyTree.