CryptoTensors

This repository implements CryptoTensors, an LLM file format for secure model distribution. This implementation extends safetensors with encryption, signing, and access control capabilities while maintaining full backward compatibility with safetensors.

CryptoTensors provides:

🔐 Encryption: AES-GCM and ChaCha20-Poly1305 encryption for tensor data
✍️ Signing: Ed25519 signature verification for file integrity
🔑 Key Management: Flexible key provider system (environment variables, files, programmatic)
🛡️ Access Policy: Rego-based policy engine for fine-grained access control
🔄 Transparent Integration: Works seamlessly with transformers, vLLM, and other ML frameworks

This project is a derivative work based on safetensors by Hugging Face. See NOTICE for details.

This implementation is based on the idea of the following research paper: Zhu, H., Li, S., Li, Q., & Jin, Y. (2025). CryptoTensors: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution. arXiv:2512.04580.

Installation

Pip

You can install cryptotensors via the pip manager:

pip install cryptotensors

For backward compatibility

If you want to load encrypted CryptoTensors models without modifying your code, you can use the compatible package released on GitHub Releases:

# Uninstall the original safetensors package
pip uninstall safetensors

# Install the compatible package directly from GitHub release
# Replace {tag} with the release tag (e.g., v0.1.0)
pip install https://github.com/aiyah-meloken/cryptotensors/releases/download/{tag}/safetensors-0.7.0-py3-none-any.whl

# Example for v0.1.0:
# pip install https://github.com/aiyah-meloken/cryptotensors/releases/download/v0.1.0/safetensors-0.7.0-py3-none-any.whl

After installation, your existing code will transparently support both regular safetensors files and encrypted CryptoTensors files without any code changes. The compatible package uses the safetensors namespace but internally depends on cryptotensors, enabling seamless encryption support.

From source

For the sources, you need Rust

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Make sure it's up to date and using stable channel
rustup update
git clone https://github.com/aiyah-meloken/cryptotensors
cd cryptotensors/bindings/python
pip install setuptools_rust
pip install -e .

Getting started

Basic Usage (Encryption and Decryption)

🆕 v0.2 New Config API

CryptoTensors 0.2 introduces a new, more flexible configuration system:

import torch
from cryptotensors import SerializeCryptoConfig, save_file
from cryptotensors.torch import load_file

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}

# Method 1: Direct keys (simple scenarios)
config = SerializeCryptoConfig(
    enc_key={"alg": "aes256gcm", "kid": "my-enc", "k": "base64-encoded-key"},
    sign_key={"alg": "ed25519", "kid": "my-sign", "x": "...", "d": "..."}
)
save_file(tensors, "model.cryptotensors", config=config.to_dict())

# Method 2: Using kid/jku (with global Registry)
from cryptotensors import register_direct_key_provider

register_direct_key_provider(files=["keys.jwk"])  # Register keys once
config = SerializeCryptoConfig(enc_kid="my-enc", sign_kid="my-sign")
save_file(tensors, "model.cryptotensors", config=config.to_dict())

# Load encrypted file (keys auto-retrieved from Registry)
tensors = load_file("model.cryptotensors")

Classic API (Still Supported)

The classic dict-based configuration is still fully supported:

# Old API still works
config = {
    "enc_key": enc_key,    # JWK format encryption key
    "sign_key": sign_key,  # JWK format signing key
}
save_file(tensors, "model.cryptotensors", config=config)

See KEY_MANAGEMENT_GUIDE.md for detailed key management guide and documentation for more examples.

Backward Compatibility (Safetensors Compatible)

You can use cryptotensors as a drop-in replacement for safetensors in most cases, where you can save and load unencrypted models as usual.

import torch
from cryptotensors import safe_open
from cryptotensors.torch import save_file

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}
save_file(tensors, "model.safetensors")

tensors = {}
with safe_open("model.safetensors", framework="pt", device="cpu") as f:
   for key in f.keys():
       tensors[key] = f.get_tensor(key)

Additional Information

File Format

The file format is the same as the safetensors format, with the following additional fields:

8 bytes: N, an unsigned little-endian 64-bit integer, containing the size of the header
N bytes: a JSON UTF-8 string representing the header.
- The header data MUST begin with a { character (0x7B).
- The header data MAY be trailing padded with whitespace (0x20).
- The header is a dict like {"TENSOR_NAME": {"dtype": "F16", "shape": [1, 16, 256], "data_offsets": [BEGIN, END]}, "NEXT_TENSOR_NAME": {...}, ...},
  - data_offsets point to the tensor data relative to the beginning of the byte buffer (i.e. not an absolute position in the file), with BEGIN as the starting offset and END as the one-past offset (so total tensor byte size = END - BEGIN).
- A special key __metadata__ is allowed to contain free form string-to-string map. Arbitrary JSON is not allowed, all values must be strings.
- Cryptotensors add the following fields to the __metadata__ section:
  - __encryption__: JSON string containing per-tensor encryption information (algorithm, IVs, tags, wrapped keys, etc.). The format of this field depends on the version specified in __crypto_keys__.
  - __crypto_keys__: JSON string containing key material information in the format {"version": "1"|"2", "chunk_size": 2097152, "enc": {...}, "sign": {...}}. version "1" uses monolithic encryption, while "2" uses chunked encryption. chunk_size is only present in version "2". No secrets are stored in this field, and the metadata is used to retrieve the keys from the key providers.
  - __signature__: Base64-encoded Ed25519 signature of the file header (excluding the signature itself) for integrity verification
  - __policy__: JSON string containing access control policy in Rego format
For detailed format specifications and the differences between v1 and v2, please refer to FORMAT.md.
Rest of the file: byte-buffer.

Notes & Benefits

Two stages of encryption: the entire header is encrypted using the master decryption key, and the tensor data is encrypted using the per-tensor encryption keys.
Lazy decryption: Encrypted tensors are decrypted on-demand when accessed, maintaining the benefits of lazy loading while ensuring security. This allows loading large encrypted models without decrypting all tensors upfront, preserving memory efficiency and supporting distributed settings where only specific tensors are needed.
Zero-copy buffer passing: Decrypted tensor data is exposed to Python via the buffer protocol, allowing frameworks like PyTorch and NumPy to reference the memory directly without an extra copy.
- Note: Zero-copy buffer protocol support requires Python 3.11+ and is disabled on PyPy due to C-API constraints.
Python Support: Supports 3.11, 3.12, and 3.13.
- Note: Python 3.14 (preview) is not yet supported due to upstream dependency constraints.

Note: Unless otherwise specified, all other notes, features, and benefits of the cryptotensors format are the same as the safetensors format.

License: Apache-2.0

cryptotensors 0.2.3