Skip to main content

BcpEncoder

Struct BcpEncoder 

Source
pub struct BcpEncoder { /* private fields */ }
Expand description

BCP encoder — constructs a binary payload from structured blocks.

The encoder is the tool-facing API that allows agents, MCP servers, and other producers to build BCP payloads. It follows the builder pattern defined in RFC §5.6: methods like add_code, add_conversation, etc. append typed blocks to an internal list, and chainable modifiers like with_summary and with_priority annotate the most recently added block.

§Compression (RFC §4.6)

Two compression modes are supported, both opt-in:

  • Per-block: call with_compression after adding a block, or compress_blocks to enable compression for all subsequent blocks. Each block body is independently zstd-compressed if it exceeds COMPRESSION_THRESHOLD bytes and compression yields a size reduction. The block’s COMPRESSED flag (bit 1) is set when compression is applied.

  • Whole-payload: call compress_payload to zstd-compress all bytes after the 8-byte header. When enabled, per-block compression is skipped (whole-payload subsumes it). The header’s COMPRESSED flag (bit 0) is set.

§Content Addressing (RFC §4.7)

When a ContentStore is configured via set_content_store, blocks can be stored by their BLAKE3 hash rather than inline:

  • Per-block: call with_content_addressing after adding a block. The body is hashed, stored in the content store, and replaced with the 32-byte hash on the wire. The block’s IS_REFERENCE flag (bit 2) is set.

  • Auto-dedup: call auto_dedup to automatically content-address any block whose body has been seen before. First occurrence is stored inline and registered in the store; subsequent identical blocks become references.

Content addressing runs before compression — a 32-byte hash reference is always below the compression threshold, so reference blocks are never compressed.

§Usage

use bcp_encoder::BcpEncoder;
use bcp_types::enums::{Lang, Role, Status, Priority};

let payload = BcpEncoder::new()
    .add_code(Lang::Rust, "src/main.rs", b"fn main() {}")
    .with_summary("Entry point: CLI setup and server startup.")?
    .with_priority(Priority::High)?
    .add_conversation(Role::User, b"Fix the timeout bug.")
    .add_conversation(Role::Assistant, b"I'll examine the pool config...")
    .add_tool_result("ripgrep", Status::Ok, b"3 matches found.")
    .encode()?;

§Output layout

The .encode() method serializes all accumulated blocks into a self-contained byte sequence:

┌──────────────┬──────────────────────────────────────────┐
│ [8 bytes]    │ File header (magic, version, flags, rsv) │
│ [N bytes]    │ Block 0 frame (type + flags + len + body)│
│ [N bytes]    │ Block 1 frame ...                        │
│ ...          │                                          │
│ [2-3 bytes]  │ END sentinel (type=0xFF, flags=0, len=0) │
└──────────────┴──────────────────────────────────────────┘

When whole-payload compression is enabled, the layout becomes:

┌──────────────┬──────────────────────────────────────────┐
│ [8 bytes]    │ Header (flags bit 0 = COMPRESSED)        │
│ [N bytes]    │ zstd(Block 0 + Block 1 + ... + END)      │
└──────────────┴──────────────────────────────────────────┘

The payload is ready for storage or transmission — no further framing is required.

Implementations§

Source§

impl BcpEncoder

Source

pub fn new() -> Self

Create a new encoder with default settings (version 1.0, no flags).

The encoder starts with an empty block list, no compression, and no content store. At least one block must be added before calling .encode(), otherwise it returns EncodeError::EmptyPayload.

Source

pub fn add_code(&mut self, lang: Lang, path: &str, content: &[u8]) -> &mut Self

Add a CODE block.

Encodes a source code file or fragment. The lang enum identifies the programming language (used by the decoder for syntax-aware rendering), path is the file path (UTF-8), and content is the raw source bytes.

For partial files, use add_code_range to include line range metadata.

Source

pub fn add_code_range( &mut self, lang: Lang, path: &str, content: &[u8], line_start: u32, line_end: u32, ) -> &mut Self

Add a CODE block with a line range.

Same as add_code but includes line_start and line_end metadata (1-based, inclusive). The decoder can use this to display line numbers or to correlate with diagnostics.

Source

pub fn add_conversation(&mut self, role: Role, content: &[u8]) -> &mut Self

Add a CONVERSATION block.

Represents a single chat turn. The role identifies the speaker (system, user, assistant, or tool) and content is the message body as raw bytes.

Source

pub fn add_conversation_tool( &mut self, role: Role, content: &[u8], tool_call_id: &str, ) -> &mut Self

Add a CONVERSATION block with a tool call ID.

Used for tool-role messages that reference a specific tool invocation. The tool_call_id links this response back to the tool call that produced it.

Source

pub fn add_file_tree( &mut self, root: &str, entries: Vec<FileEntry>, ) -> &mut Self

Add a FILE_TREE block.

Represents a directory structure rooted at root. Each entry contains a name, kind (file or directory), size, and optional nested children for recursive directory trees.

Source

pub fn add_tool_result( &mut self, name: &str, status: Status, content: &[u8], ) -> &mut Self

Add a TOOL_RESULT block.

Captures the output of an external tool invocation (e.g. ripgrep, LSP diagnostics, test runner). The status indicates whether the tool succeeded, failed, or timed out.

Source

pub fn add_document( &mut self, title: &str, content: &[u8], format_hint: FormatHint, ) -> &mut Self

Add a DOCUMENT block.

Represents prose content — README files, documentation, wiki pages. The format_hint tells the decoder how to render the body (markdown, plain text, or HTML).

Source

pub fn add_structured_data( &mut self, format: DataFormat, content: &[u8], ) -> &mut Self

Add a STRUCTURED_DATA block.

Encodes tabular or structured content — JSON configs, YAML manifests, TOML files, CSV data. The format identifies the serialization format so the decoder can syntax-highlight or parse appropriately.

Source

pub fn add_diff(&mut self, path: &str, hunks: Vec<DiffHunk>) -> &mut Self

Add a DIFF block.

Represents code changes for a single file — from git diffs, editor changes, or patch files. Each hunk captures a contiguous range of modifications in unified diff format.

Source

pub fn add_annotation( &mut self, target_block_id: u32, kind: AnnotationKind, value: &[u8], ) -> &mut Self

Add an ANNOTATION block.

Annotations are metadata overlays that target another block by its zero-based index in the stream. The kind determines how the value payload is interpreted (priority level, summary text, or tag label).

For the common case of attaching a priority to the most recent block, prefer with_priority.

Source

pub fn add_embedding_ref( &mut self, vector_id: &[u8], source_hash: &[u8], model: &str, ) -> &mut Self

Add an EMBEDDING_REF block.

Points to a pre-computed vector embedding stored externally (e.g. in a vector database). The vector_id is an opaque byte identifier for the vector in the external store, source_hash is the BLAKE3 hash of the content that was embedded (32 bytes), and model is the name of the embedding model (e.g. "text-embedding-3-small").

§Wire type

Block type 0x09 (EMBEDDING_REF). See RFC §4.4.

Source

pub fn add_image( &mut self, media_type: MediaType, alt_text: &str, data: &[u8], ) -> &mut Self

Add an IMAGE block.

Encodes an image as inline binary data. The media_type identifies the image format (PNG, JPEG, etc.), alt_text provides a textual description for accessibility, and data is the raw image bytes.

Source

pub fn add_extension( &mut self, namespace: &str, type_name: &str, content: &[u8], ) -> &mut Self

Add an EXTENSION block.

User-defined block type for custom payloads. The namespace and type_name together form a unique identifier for the extension type, preventing collisions across different tools and vendors.

Source

pub fn with_summary(&mut self, summary: &str) -> Result<&mut Self, EncodeError>

Attach a summary to the most recently added block.

Sets the HAS_SUMMARY flag on the block and prepends the summary sub-block to the body during serialization. The summary is a compact UTF-8 description that the token budget engine can use as a stand-in when the full block content would exceed the budget.

§Errors

Returns EncodeError::NoBlockTarget if no blocks have been added yet. Use this immediately after an .add_*() call.

Source

pub fn with_priority( &mut self, priority: Priority, ) -> Result<&mut Self, EncodeError>

Attach a priority annotation to the most recently added block.

This is a convenience method that appends an ANNOTATION block with kind=Priority targeting the last added block’s index. The annotation’s value is the priority byte (e.g. 0x02 for Priority::High).

§Errors

Returns EncodeError::NoBlockTarget if no blocks have been added yet.

Source

pub fn with_compression(&mut self) -> Result<&mut Self, EncodeError>

Enable zstd compression for the most recently added block.

During .encode(), the block body is compressed with zstd if it exceeds COMPRESSION_THRESHOLD bytes and compression yields a size reduction. If compression doesn’t help (output >= input), the body is stored uncompressed and the COMPRESSED flag is not set.

Has no effect if compress_payload is also enabled — whole-payload compression takes precedence.

§Errors

Returns EncodeError::NoBlockTarget if no blocks have been added yet.

Source

pub fn compress_blocks(&mut self) -> &mut Self

Enable zstd compression for all blocks added so far and all future blocks.

Equivalent to calling with_compression on every block. Individual blocks still respect the size threshold and no-savings guard.

Source

pub fn compress_payload(&mut self) -> &mut Self

Enable whole-payload zstd compression.

When set, the entire block stream (all frames + END sentinel) is compressed as a single zstd frame. The 8-byte header is written uncompressed with HeaderFlags::COMPRESSED set so the decoder can detect compression before reading further.

When whole-payload compression is enabled, per-block compression is skipped — compressing within a compressed stream adds overhead without benefit.

If compression doesn’t reduce the total size, the payload is stored uncompressed and the header flag is not set.

Tradeoff: Whole-payload compression disables incremental streaming in StreamingDecoder — the decoder must buffer and decompress the entire payload before yielding any blocks. If streaming is important, use compress_blocks instead.

Source

pub fn set_content_store(&mut self, store: Arc<dyn ContentStore>) -> &mut Self

Set the content store used for BLAKE3 content addressing.

The store is shared via Arc so the same store can be passed to both the encoder and decoder for roundtrip workflows. The encoder calls store.put() for each content-addressed block; the decoder calls store.get() to resolve references.

Must be called before .encode() if any block has content addressing enabled or if auto_dedup is set.

Source

pub fn with_content_addressing(&mut self) -> Result<&mut Self, EncodeError>

Enable content addressing for the most recently added block.

During .encode(), the block body is hashed with BLAKE3, stored in the content store, and replaced with the 32-byte hash on the wire. The block’s IS_REFERENCE flag (bit 2) is set.

Requires a content store — call set_content_store before .encode().

Content addressing runs before compression. Since a 32-byte hash reference is always below COMPRESSION_THRESHOLD, reference blocks are never per-block compressed.

§Errors

Returns EncodeError::NoBlockTarget if no blocks have been added yet.

Source

pub fn auto_dedup(&mut self) -> &mut Self

Enable automatic deduplication across all blocks.

When set, the encoder hashes every block body with BLAKE3 during .encode(). If the hash already exists in the content store (i.e. a previous block in this or a prior encoding had the same content), the block is automatically replaced with a hash reference. First-occurrence blocks are stored inline and registered in the store for future dedup.

Requires a content store — call set_content_store before .encode().

Source

pub fn encode(&self) -> Result<Vec<u8>, EncodeError>

Serialize all accumulated blocks into a complete BCP payload.

The encode pipeline processes each PendingBlock through up to three stages:

  1. Serialize — calls BlockContent::encode_body to get the TLV-encoded body bytes. If a summary is present, it is prepended and the HAS_SUMMARY flag is set.

  2. Content address (optional) — if the block has content_address = true or auto-dedup detects a duplicate, the body is hashed with BLAKE3, stored in the content store, and replaced with the 32-byte hash. The IS_REFERENCE flag (bit 2) is set.

  3. Per-block compress (optional) — if compression is enabled for this block, whole-payload compression is NOT active, and the body is not a reference, the body is zstd-compressed if it exceeds COMPRESSION_THRESHOLD and compression yields savings. The COMPRESSED flag (bit 1) is set.

After all blocks, the END sentinel is appended. If whole-payload compression is enabled, everything after the 8-byte header is compressed as a single zstd frame and the header’s COMPRESSED flag is set.

§Errors

Trait Implementations§

Source§

impl Default for BcpEncoder

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.