Skip to main content

Crate codec_rs

Crate codec_rs 

Source
Expand description

§codec-rs

Rust port of the Codec binary transport protocol — the functional twin of Codec.Net, @codecai/web, and codecai.

Codec carries uint32 token IDs on the wire instead of UTF-8 / JSON, deferring text decoding to the presentation layer. This crate lets a Rust client decode/encode Codec frames, parse tokenizer maps, detokenize/tokenize, watch for tool calls, translate cross-vocab, and load maps over HTTP — all with full sha256 verification.

§Quick start

use codec_rs::{TokenizerMap, Detokenizer, DetokenizeOptions};

let map = TokenizerMap::from_json(json).unwrap();
let mut detok = Detokenizer::new(&map);
for frame in frames {
    let opts = DetokenizeOptions { partial: !frame.done, render_special: false };
    let text = detok.render(&frame.ids, opts);
    print!("{text}");
}

See module-level docs and the project README for the full surface.

Re-exports§

pub use version_signaling::parse_version_policy_document;
pub use version_signaling::parse_version_required;
pub use version_signaling::well_known_version_policy_url;
pub use version_signaling::CodecVersionPolicyDocument;
pub use version_signaling::CodecVersionRequiredBody;
pub use version_signaling::HttpStatus;
pub use version_signaling::VersionSignalingError;
pub use version_signaling::CODEC_CLIENT_VERSION;
pub use version_signaling::CODEC_CLIENT_VERSION_HEADER;
pub use version_signaling::CODEC_MIN_VERSION_HEADER;
pub use version_signaling::CODEC_REQUIRED_FEATURES_HEADER;
pub use version_signaling::discover_version_policy_blocking;
pub use byte_encoder::decode_byte_level_token;
pub use byte_encoder::encode_byte_level_chars;
pub use byte_encoder::METASPACE;
pub use compression::hash_zstd_dict;
pub use compression::select_zstd_dict_for_response;
pub use compression::CodecZstdDictError;
pub use detokenize::Detokenizer;
pub use detokenize::DetokenizeOptions;
pub use frame::CodecFrame;
pub use frame::IMapCache;
pub use frame::MapCache;
pub use frame::MemoryMapCache;
pub use longest_match::LongestMatchTokenizer;
pub use longest_match::Tokenize;
pub use map::TokenizerMap;
pub use map::TokenizerMapError;
pub use map::ToolCallingArgsFormat;
pub use map::ToolCallingBlock;
pub use map::ToolCallingConvention;
pub use map::ToolCallingMarkers;
pub use map::ToolCallingResultFormat;
pub use map_loader::LoadError;
pub use map_loader::LoadOptions;
pub use map_loader::MapLoader;
pub use map_loader::TokenizerMapHashMismatchError;
pub use safety_policy::Category as SafetyCategory;
pub use safety_policy::CategoryAction;
pub use safety_policy::ClassifierBlock as SafetyClassifierBlock;
pub use safety_policy::ClassifierHost;
pub use safety_policy::ClientHooksBlock as SafetyClientHooksBlock;
pub use safety_policy::EngineFeature;
pub use safety_policy::PublisherBlock as SafetyPublisherBlock;
pub use safety_policy::RulesSummary as SafetyRulesSummary;
pub use safety_policy::SafetyPolicyDescriptor;
pub use safety_policy::SafetyPolicyError;
pub use safety_policy::SafetyPolicyPointer;
pub use safety_policy::POLICY_WELL_KNOWN_BASE;
pub use safety_policy::discover_safety_policy;
pub use safety_policy::load_safety_policy;
pub use stream::decode_msgpack_stream;
pub use stream::decode_protobuf_frame;
pub use stream::decode_protobuf_stream;
pub use stream::MsgpackFrameIter;
pub use stream::ProtobufFrameIter;
pub use stream::StreamError;
pub use pretok_program::run_pretok_program;
pub use pretok_program::PreTokOp;
pub use pretok_program::PreTokProgram;
pub use tokenize::BPETokenizer;
pub use tokenize::ITokenizer;
pub use tool_watcher::ToolWatcher;
pub use tool_watcher::ToolWatcherError;
pub use tool_watcher::WatcherEvent;
pub use tool_watcher::WatcherEventKind;
pub use translator::static_translation_table;
pub use translator::translate_one_shot;
pub use translator::Translator;

Modules§

byte_encoder
GPT-2 byte ↔ unicode mapping table and helpers shared by the Detokenizer and BPE encoder.
compression
Client-side helpers for the Codec compression contract.
detokenize
Stateful detokenizer: token IDs → text.
frame
CodecFrame and the pluggable map cache.
longest_match
Vocab-only longest-prefix-match tokenizer.
map
TokenizerMap — the per-model dialect record. Maps are content-addressed (sha256) and immutable.
map_loader
Fetch, verify, and cache tokenizer maps.
pretok_program
Pre-tokenizer program interpreter.
safety_policy
Safety-policy descriptor loading, validation, and discovery.
stream
Stream decoders for the two Codec wire formats.
tokenize
Pure-Rust BPE encoder. Text → token IDs.
tool_watcher
Tool-call / region watcher.
translator
Translator — cross-vocab token-stream pipe.
version_signaling
Codec v0.4 version negotiation — client-side primitives.