libpep: Library for polymorphic pseudonymization and encryption
This library implements PEP cryptography based on ElGamal encrypted messages.
In the ElGamal scheme, a message M can be encrypted for a receiver which has public key Y associated with it, belonging to secret key y.
This encryption is random: every time a different random b is used, results in different ciphertexts (encrypted messages).
We represent this encryption function as Enc(b, M, Y).
The library supports three homomorphic operations on ciphertext in (= Enc(b, M, Y), encrypting message M for public key Y with random b):
out = rekey(in, k): ifincan be decrypted by secret keyy, thenoutcan be decrypted by secret keyk*y. Decryption will both result in messageM. Specifically,in = Enc(r, M, Y)is transformed toout = Enc(r, M, k*Y).out = reshuffle(in, s): modifies a ciphertextin(an encrypted form ofM), so that after decryption ofoutthe decrypted message will be equal tos*M. Specifically,in = Enc(r, M, Y)is transformed toout = Enc(r, n*M, Y).o = rerandomize(in, r): scrambles a ciphertext. Bothinandoutcan be decrypted by the same secret keyy, both resulting in the same decrypted messageM. However, the binary form ofinandoutdiffers. Spec:in = Enc(b, M, Y)is transformed toout = Enc(r+b, M, Y);
The reshuffle(in, n) and rekey(in, k) can be combined in a slightly more efficient rsk(in, k, n).
Additionally, reshuffle2(in, n_from, n_to) and rekey2(in, k_from, k_to), as well as rsk2(...), can be used for bidirectional transformations between two keys, effectively applying k = k_from^-1 * k_to and n = n_from^-1 * n_to.
The key idea behind this form of cryptography is that the pseudonymization and rekeying operations are applied on encrypted data. This means that during initial encryption, the ultimate receiver(s) do(es) not yet need to be known. Data can initially be encrypted for one key, and later rekeyed and potentially reshuffled (in case of identifiers) for another key, leading to asynchronous end-to-end encryption with built-in pseudonymisation.
Apart from a Rust crate, this library provides bindings for multiple platforms:
Language Bindings
Python
Install from PyPI:
Use with direct imports from submodules:
# Generate keys
=
# Create and work with pseudonyms
=
# Create data points
=
WebAssembly (WASM)
Install from npm:
Use in Node.js or browser applications:
import * as libpep from '@nolai/libpep-wasm';
// Generate keys
const keys = libpep.;
// Create and work with pseudonyms
const pseudonym = libpep.;
console.log;
// Create data points
const data = libpep.;
console.log;
API Structure
Both Python and WASM bindings mirror the Rust API structure with the same modules:
| Module | Description |
|---|---|
arithmetic |
Basic arithmetic operations on scalars and group elements |
elgamal |
ElGamal encryption and decryption primitives |
primitives |
Core PEP operations (rekey, reshuffle, rerandomize) |
high_level |
User-friendly API with Pseudonym and Attribute classes |
distributed |
Distributed n-PEP operations with multiple servers |
For detailed API documentation, see docs.rs/libpep.
Applications
For pseudonymization, the core operation is reshuffle with s.
It modifies a main pseudonym with a factor s that is specific to a user (or user group) receiving the pseudonym.
After applying a user specific factor s, a pseudonym is called a local pseudonym.
The factor s is typically tied to the access group or domain of a user, which we call the pseudonymization domain.
Using only a reshuffle is insufficient, as the pseudonym is still encrypted for a key the user does not possess.
To allow a user to decrypt the encrypted pseudonym, a rekey with k is needed, in combination with a protocol to hand the user the secret key k*y.
The factor k is typically tied to the current session of a user, which we call the encryption context.
When the same encrypted pseudonym is used multiple times, rerandomize is applied every time. This way a binary compare of the encrypted pseudonym will not leak any information.
Security and Implementation
This library uses the Ristretto encoding on Curve25519, implemented in the curve25519-dalek crate, with patches by Signal for lizard encoding of arbitrary 16 byte values into ristretto points.
Security Considerations
- All cryptographic operations use constant-time algorithms to prevent timing attacks
- Random number generation uses cryptographically secure sources
- The library has been designed for production use but hasn't yet undergone formal security auditing
- Users should properly secure private keys and avoid exposing sensitive cryptographic material
Arithmetic Rules
There are a number of arithmetic rules for scalars and group elements: group elements can be added and subtracted from each other.
Scalars support addition, subtraction, and multiplication.
Division can be done by multiplying with the inverse (using s.invert() for non-zero scalar s).
A scalar can be converted to a group element (by multiplying with the special generator G), but not the other way around.
Group elements can also be multiplied by a scalar.
Group elements have an almost 32 byte range (top bit is always zero, and some other values are invalid).
Group elements can be generated by GroupElement::random(..) or GroupElement::from_hash(..).
Scalars are also 32 bytes, and can be generated with Scalar::random(..) or Scalar::from_hash(..).
There are specific classes for ScalarNonZero and ScalarCanBeZero, since for almost all PEP operations, the scalar should be non-zero.
API
We offer APIs at different abstraction levels.
- The
arithmeticmodule (internal API) offers the basic arithmetic operations on scalars and group elements and theelgamalmodule offers the ElGamal encryption and decryption operations. - The
primitivesmodule implements the basic PEP operations such asrekey,reshuffle, andrerandomizeand the extendedrekey2andreshuffle2variants, as well as a combinedrskandrsk2operation. - The
high_levelmodule offer a more user-friendly API with many high level data types such asPseudonymsandAttributes. - The
distributedmodule additionally provides a high-level API for distributed scenarios, where multiple servers are involved in the rekeying and reshuffling operations and keys are derived from multiple master keys.
Depending on the use case, you can choose the appropriate level of abstraction.
Development
Prerequisites
- Rust 1.70+ (MSRV)
- Node.js 18+ (for WASM bindings)
- Python 3.8+ (for Python bindings)
Building and Testing
Build and test the core Rust library:
Run tests with different feature combinations:
Building Bindings
Python
To build Python bindings for testing:
WASM
To build WASM bindings for testing:
The following features are available:
python: enables the Python bindings (mutually exclusive withwasm).wasm: enables the WASM library (mutually exclusive withpython).elgamal3: enables longer ElGamal for debugging purposes or backward compatibility, but with being less efficient.legacy-pep-repo-compatible: enables the legacy PEP repository compatible mode, which uses a different function to derive scalars from domains, contexts and secrets.insecure-methods: enables insecure methods, to be used with care.build-binary: builds thepeppycommand-line tool to interact with the library (not recommended for production use).
Note: The python and wasm features are mutually exclusive because PyO3 (Python bindings) builds a cdylib that links to the Python interpreter, while wasm-bindgen builds a cdylib targeting WebAssembly. These have incompatible linking requirements and cannot coexist in the same build.
Install
Install using
cargo install libpep
Run peppy using cargo:
cargo run --bin peppy
License
- Authors: Bernard van Gastel and Job Doesburg
- License: Apache License 2.0
Background
Based on the article by Eric Verheul and Bart Jacobs, Polymorphic Encryption and Pseudonymisation in Identity Management and Medical Research. In Nieuw Archief voor Wiskunde (NAW), 5/18, nr. 3, 2017, p. 168-172.