Crate brk_indexer

Crate brk_indexer 

Source
Expand description

§brk_indexer

High-performance Bitcoin blockchain indexer with parallel processing and dual storage architecture.

Crates.io Documentation

§Overview

This crate provides a comprehensive Bitcoin blockchain indexer built on top of brk_parser. It processes raw Bitcoin blocks in parallel, extracting and indexing transactions, addresses, inputs, outputs, and metadata into optimized storage structures. The indexer maintains two complementary storage systems: columnar vectors for analytics and key-value stores for fast lookups.

Key Features:

  • Parallel block processing with multi-threaded transaction analysis
  • Dual storage architecture: columnar vectors + key-value stores
  • Address type classification and indexing for all Bitcoin script types
  • Collision detection and validation for address hashes and transaction IDs
  • Incremental processing with automatic rollback and recovery
  • Height-based synchronization with Bitcoin Core RPC validation
  • Optimized batch operations with configurable snapshot intervals

Target Use Cases:

  • Bitcoin blockchain analysis requiring full transaction history
  • Address clustering and UTXO set analysis
  • Blockchain explorers needing fast address/transaction lookups
  • Research applications requiring structured access to blockchain data

§Installation

cargo add brk_indexer

§Quick Start

use brk_indexer::Indexer;
use brk_parser::Parser;
use bitcoincore_rpc::{Client, Auth};
use vecdb::Exit;
use std::path::Path;

// Initialize Bitcoin Core RPC client
let rpc = Client::new("http://localhost:8332", Auth::None)?;
let rpc = Box::leak(Box::new(rpc));

// Create parser for raw block data
let blocks_dir = Path::new("/path/to/bitcoin/blocks");
let parser = Parser::new(blocks_dir, None, rpc);

// Initialize indexer with output directory
let outputs_dir = Path::new("./indexed_data");
let mut indexer = Indexer::forced_import(outputs_dir)?;

// Index blockchain data
let exit = Exit::default();
let starting_indexes = indexer.index(&parser, rpc, &exit, true)?;

println!("Indexed up to height: {}", starting_indexes.height);

§API Overview

§Core Types

  • Indexer: Main coordinator managing vectors and stores
  • Vecs: Columnar storage for blockchain data analytics
  • Stores: Key-value storage for fast hash-based lookups
  • Indexes: Current indexing state tracking progress across data types

§Key Methods

Indexer::forced_import(outputs_dir: &Path) -> Result<Self> Creates or opens indexer instance with automatic version management.

index(&mut self, parser: &Parser, rpc: &'static Client, exit: &Exit, check_collisions: bool) -> Result<Indexes> Main indexing function processing blocks from parser with collision detection.

§Storage Architecture

Columnar Vectors (Vecs):

  • height_to_*: Block-level data (hash, timestamp, difficulty, size, weight)
  • txindex_to_*: Transaction data (ID, version, locktime, size, RBF flag)
  • outputindex_to_*: Output data (value, type, address mapping)
  • inputindex_to_outputindex: Input-to-output relationship mapping

Key-Value Stores:

  • addressbyteshash_to_typeindex: Address hash to internal index mapping
  • blockhashprefix_to_height: Block hash prefix to height lookup
  • txidprefix_to_txindex: Transaction ID prefix to internal index
  • addresstype_to_typeindex_with_outputindex: Address type to output mappings

§Address Type Support

Complete coverage of Bitcoin script types:

  • P2PK: Pay-to-Public-Key (33-byte and 65-byte variants)
  • P2PKH: Pay-to-Public-Key-Hash
  • P2SH: Pay-to-Script-Hash
  • P2WPKH: Pay-to-Witness-Public-Key-Hash
  • P2WSH: Pay-to-Witness-Script-Hash
  • P2TR: Pay-to-Taproot
  • P2MS: Pay-to-Multisig
  • P2A: Pay-to-Address (custom type)
  • OpReturn: OP_RETURN data outputs
  • Empty/Unknown: Non-standard script types

§Examples

§Basic Indexing Operation

use brk_indexer::Indexer;
use brk_parser::Parser;
use std::path::Path;

// Initialize components
let outputs_dir = Path::new("./blockchain_index");
let mut indexer = Indexer::forced_import(outputs_dir)?;

let blocks_dir = Path::new("/Users/satoshi/.bitcoin/blocks");
let parser = Parser::new(blocks_dir, None, rpc);

// Index with collision checking enabled
let exit = vecdb::Exit::default();
let final_indexes = indexer.index(&parser, rpc, &exit, true)?;

println!("Final height: {}", final_indexes.height);
println!("Total transactions: {}", final_indexes.txindex);
println!("Total addresses: {}", final_indexes.total_address_count());

§Querying Indexed Data

use brk_indexer::Indexer;
use brk_structs::{Height, TxidPrefix, AddressBytesHash};

let indexer = Indexer::forced_import("./blockchain_index")?;

// Look up block hash by height
let height = Height::new(750000);
if let Some(block_hash) = indexer.vecs.height_to_blockhash.get(height)? {
    println!("Block 750000 hash: {}", block_hash);
}

// Look up transaction by ID prefix
let txid_prefix = TxidPrefix::from_str("abcdef123456")?;
if let Some(tx_index) = indexer.stores.txidprefix_to_txindex.get(&txid_prefix)? {
    println!("Transaction index: {}", tx_index);
}

// Query address information
let address_hash = AddressBytesHash::from(/* address bytes */);
if let Some(type_index) = indexer.stores.addressbyteshash_to_typeindex.get(&address_hash)? {
    println!("Address type index: {}", type_index);
}

§Incremental Processing

use brk_indexer::Indexer;

// Indexer automatically resumes from last processed height
let mut indexer = Indexer::forced_import("./blockchain_index")?;

let current_indexes = indexer.vecs.current_indexes(&indexer.stores, rpc)?;
println!("Resuming from height: {}", current_indexes.height);

// Process new blocks incrementally
let exit = vecdb::Exit::default();
let updated_indexes = indexer.index(&parser, rpc, &exit, true)?;

println!("Processed {} new blocks",
         updated_indexes.height.as_u32() - current_indexes.height.as_u32());

§Address Type Analysis

use brk_indexer::Indexer;
use brk_structs::OutputType;

let indexer = Indexer::forced_import("./blockchain_index")?;

// Analyze address distribution by type
for output_type in OutputType::as_vec() {
    let count = indexer.vecs.outputindex_to_outputtype
        .iter()
        .filter(|&ot| ot == output_type)
        .count();

    println!("{:?}: {} outputs", output_type, count);
}

// Query specific address type data
let p2pkh_store = &indexer.stores.addresstype_to_typeindex_with_outputindex
    .p2pkh;

println!("P2PKH addresses: {}", p2pkh_store.len());

§Architecture

§Parallel Processing

The indexer uses sophisticated parallel processing:

  • Block-Level Parallelism: Concurrent processing of transactions within blocks
  • Transaction Analysis: Parallel input/output processing with rayon
  • Address Resolution: Multi-threaded address type classification and indexing
  • Collision Detection: Parallel validation of hash collisions across address types

§Storage Optimization

Columnar Storage (vecdb):

  • Compressed vectors for space-efficient analytics queries
  • Raw vectors for frequently accessed data (heights, hashes)
  • Page-aligned storage for memory mapping efficiency

Key-Value Storage (Fjall):

  • LSM-tree architecture for write-heavy indexing workloads
  • Bloom filters for fast negative lookups
  • Transactional consistency with rollback support

§Memory Management

  • Batch Processing: 1000-block snapshots to balance memory and I/O
  • Reader Management: Static readers for consistent data access during processing
  • Collision Tracking: BTreeMap-based collision detection with memory cleanup
  • Exit Handling: Graceful shutdown with consistent state preservation

§Version Management

  • Schema Versioning: Automatic migration on version changes (currently v21)
  • Rollback Support: Automatic recovery from incomplete processing
  • State Tracking: Height-based synchronization across all storage components

§Performance Characteristics

§Processing Speed

  • Parallel Transaction Processing: Multi-core utilization for CPU-intensive operations
  • Optimized I/O: Batch operations reduce disk overhead
  • Memory Efficiency: Streaming processing without loading entire blockchain

§Storage Requirements

  • Columnar Compression: Significant space savings for repetitive blockchain data
  • Index Optimization: Bloom filters reduce lookup overhead
  • Incremental Growth: Storage scales linearly with blockchain size

§Scalability

  • Height-Based Partitioning: Enables distributed processing strategies
  • Modular Architecture: Separate vector and store systems for flexible deployment
  • Resource Configuration: Configurable batch sizes and memory limits

§Code Analysis Summary

Main Structure: Indexer coordinating Vecs (columnar analytics) and Stores (key-value lookups)
Processing Pipeline: Multi-threaded block analysis with parallel transaction/address processing
Storage Architecture: Dual system using vecdb for analytics and Fjall for lookups
Address Indexing: Complete Bitcoin script type coverage with collision detection
Synchronization: Height-based coordination with Bitcoin Core RPC validation
Parallel Processing: rayon-based parallelism for transaction analysis and address resolution
Architecture: High-performance blockchain indexer with ACID guarantees and incremental processing


This README was generated by Claude Code

Structs§

Indexer
Indexes
Stores
Vecs

Functions§

starting_index