pub struct Codebook {
pub version: u32,
pub dimensionality: usize,
pub basis_vectors: Vec<BasisVector>,
pub semantic_markers: Vec<SparseVec>,
pub statistics: CodebookStatistics,
pub salt: Option<[u8; 32]>,
}Expand description
The Codebook - acts as the private key for reconstruction
Fields§
§version: u32Version for compatibility
dimensionality: usizeDimensionality of basis vectors
basis_vectors: Vec<BasisVector>The basis vectors forming the encoding dictionary Data is projected onto these bases
semantic_markers: Vec<SparseVec>Semantic marker vectors for outlier detection
statistics: CodebookStatisticsStatistics for adaptive encoding
salt: Option<[u8; 32]>Cryptographic salt for key derivation (optional)
Implementations§
Source§impl Codebook
impl Codebook
Sourcepub fn with_salt(dimensionality: usize, salt: [u8; 32]) -> Self
pub fn with_salt(dimensionality: usize, salt: [u8; 32]) -> Self
Create a codebook with cryptographic salt for key derivation
Sourcepub fn initialize_standard_basis(&mut self)
pub fn initialize_standard_basis(&mut self)
Initialize with common basis vectors for text/binary data
Sourcepub fn initialize_byte_basis(&mut self)
pub fn initialize_byte_basis(&mut self)
Initialize with byte-level basis vectors (256 basis vectors for each byte value)
This creates a complete basis that can represent any byte data. Each byte value 0-255 gets its own basis vector.
Position basis vectors (64 vectors for positions 0-63) are also added
by default. Use initialize_byte_basis_with_config to control this.
Sourcepub fn initialize_byte_basis_with_config(
&mut self,
include_position_basis: bool,
)
pub fn initialize_byte_basis_with_config( &mut self, include_position_basis: bool, )
Initialize with byte-level basis vectors with optional position basis
§Arguments
include_position_basis- Whether to add position-aware basis vectors (64 vectors)
Sourcepub fn train(
&mut self,
training_data: &[&[u8]],
config: &CodebookTrainingConfig,
) -> usize
pub fn train( &mut self, training_data: &[&[u8]], config: &CodebookTrainingConfig, ) -> usize
Train the codebook on representative data
This learns basis vectors by analyzing patterns in the training data. The algorithm:
- Chunk the data into blocks
- Find frequently occurring patterns (n-grams)
- Create basis vectors for the most common patterns
- Optionally add byte-level basis as fallback
§Arguments
training_data- Slice of training samplesconfig- Training configuration
§Returns
Number of basis vectors learned
§ID Allocation
Basis vector IDs are allocated in non-overlapping ranges:
- Byte basis: 0-255
- Position basis: 256-319
- Learned patterns: 1000+
Sourcepub fn train_from_files(
&mut self,
paths: &[&Path],
config: &CodebookTrainingConfig,
) -> Result<usize>
pub fn train_from_files( &mut self, paths: &[&Path], config: &CodebookTrainingConfig, ) -> Result<usize>
Train codebook from files on disk
Convenience method that reads files and trains on their content.
Sourcepub fn project(&self, data: &[u8]) -> ProjectionResult
pub fn project(&self, data: &[u8]) -> ProjectionResult
Project data onto the codebook basis Returns coefficients, residual, and detected outliers
Sourcepub fn project_with_config(
&self,
data: &[u8],
config: &ProjectionConfig,
) -> ProjectionResult
pub fn project_with_config( &self, data: &[u8], config: &ProjectionConfig, ) -> ProjectionResult
Project data onto the codebook using custom configuration
Sourcepub fn reconstruct(
&self,
projection: &ProjectionResult,
expected_size: usize,
) -> Vec<u8> ⓘ
pub fn reconstruct( &self, projection: &ProjectionResult, expected_size: usize, ) -> Vec<u8> ⓘ
Reconstruct original data from projection result