Crate fuzzytags[][src]

Expand description

FuzzyTags

Anonymous messaging systems (and other privacy-preserving applications) often require a mechanism for one party to learn that another party has messaged them (“notifications”).

Many schemes rely on a bandwidth-intensive “download everything and attempt-decryption” approach. Others rely on a trusted 3rd party, or various non-collusion assumptions, to provide a “private” service. Other schemes require that parties arrange themselves in “buckets” or “mailboxes” effectively creating smaller instances of the “download everything” approach.

It would be awesome if we could get an untrusted, adversarial server to do the work for us without compromising metadata-resistance or requiring parties to split themselves into buckets (effectively dividing the anonymity set of the system)!

fuzzytags is an experimental probabilistic cryptographic tagging structure to do just that!

Instead of placing messages into deterministic buckets based on the recipient, fuzzytags allow each message to probabilistically address itself to several parties in addition to the intended party - utilizing the anonymity of the whole set of participants, instead of the ones who happen to share a bucket for a given round.

Specifically fuzzytags provide the following properties:

  • Correctness: Valid tags constructed for a specific tagging key will always validate when tested using a derived detection key.
  • Fuzziness: Tags will produce false positive matches with probability p related to the security property (γ) when tested against detection keys they were not intended for.
  • Security: An adversarial server with access to the detection key is unable to distinguish false positives from true positives. (this property is referred to as Detection Ambiguity)

Security (hic sunt dracones)

This crate provides an experimental implementation of the FMD2 scheme described in “Fuzzy Message Detection”. Using Ristretto as the prime order group.

This code has not undergone any significant review.

Further, the properties provided by this system are highly dependent on selecting a false positive rate p and scheme constant γ for your system. There is no one-size-fits-all approach.

If p is too low, then the probability of false positives for a given party will be very high.

If p is too high, then an adversarial server will be able to link messages to recipients with probability approaching 1.

Likewise a large γ means higher bandwidth costs, but a small γ reveals more of the root secret to the server while also increasing the change of perfect (but false) matches across all parties.

We are also building a simulator to understand these parameter choices in addition to other factors when deploying fuzzytags to real-world systems.

For more guidance (and warnings) on integrating fuzzytags into a privacy preserving application see documentation

Building

This crate requires experimental features currently only provided by Rust nightly:

rustup default nightly

Terminology and a more detailed System Description

There exists a metadata resistant application that uses untrusted servers to mediate communication between parties.

Each party can be identified with a set of cryptographic identifiers and there exists methods in or external to the system to distribute keys securely and authentically.

Now, instead of each party adopting a download-everything approach to metadata privacy (or invoking non-collusion or other assumptions) we can leverage fuzzytags to reduce the number of messages downloaded from the server by each party while maintaining a formalized concept of metadata privacy.

Every party generates a RootSecret, from which they can derive a DetectionKey and a TaggingKey. These keys will be generated with a parameter γ that relates to the minimum false-positive probability 2^-γ.

When submitting messages to the server for an intended recipient, the sender will generate a new tag from the recipients TaggingKey.

All parties will extract a DetectionKey from their key pair. This key will be of length n and provide a false positive detection probability of 0 <= 2^-n <= 2^-γ. This detection key can be given to an adversarial server.

When fetching new messages from the adversarial server, the server first runs a test of the tag of the message against the parties’ detection key. If the tag passes the test, the message (along with the tag) is provided to the recipient.

Finally, the recipient runs their own test of the tag against an extracted detection key such that the probability of a false positive will be 2^-n == 2^-γ. This will produce a subset of messages likely intended for the recipient, with a smaller probability of false positives.

Alternatively the recipient can simply try and decrypt every message in the subset of messages that the server provided them (depending on the efficiency of the decryption method).

Usage

A party first needs to generate RootSecret

use fuzzytags::RootSecret;
use rand::rngs::OsRng;
let mut rng = OsRng;
let secret = RootSecret::<24>::generate(&mut rng);

From the secret detection key a party can derive a DetectionKey which can be given to adversarial server to fuzzily detect tags on behalf of the party.

From the secret detection key a party can also derive a TaggingKey that can be public and given to other parties for the purpose of generating fuzzytags addressed to a given party.

The 24 in the above code is a security property (γ) in the system. For a given gamma, a tag generated for a specific public key will validate against a random public key with a maximum probability of 2^-gamma.

Generating Tags

Once in possession of a tagging key, a party in a metadata resistant app can use it to generate tags:

    use fuzzytags::RootSecret;
    use rand::rngs::OsRng;
    let mut rng = OsRng;
    let secret = RootSecret::<24>::generate(&mut rng);
    let tagging_key = secret.tagging_key();
    
    // Give public key to a another party...
    // and then they can do...
    let tag = tagging_key.generate_tag(&mut rng);

These tags can then be attached to a message in a metadata resistant system.

Testing Tags

First it is necessary to extract a detection key for a given false positive probability 0 <= 2^-n <= 2^-γ.

This extracted key can then be given to an adversarial server. The server can then test a given tag against the detection key e.g.:

    use fuzzytags::RootSecret;
    use rand::rngs::OsRng;
    let mut rng = OsRng;
    let secret = RootSecret::<24>::generate(&mut rng);
    let tagging_key = secret.tagging_key();
    // extract a detection key
    let detection_key = secret.extract_detection_key(5);
    
    // Give the tagging key to a another party...
    // and then they can do...
    let tag = tagging_key.generate_tag(&mut rng);
    
    // The server can now do this:
    if detection_key.test_tag(&tag) {
        // the message attached to this tag *might* be for the party associated with the detection key 
    } else {
        // the message attached to this tag is definitely *not* for the party associated with the detection key.
    }

Entangled Tags

When enabled with the entangled feature the TaggingKey::generate_entangled_tag function is available. This allows you to generate tags that will validate against multiple detection keys from distinct tagging keys and opens up applications like multiple broadcast and deniable sending.

   use fuzzytags::{RootSecret, TaggingKey};
   use rand::rngs::OsRng;
   let mut rng = OsRng;
   let secret_1 = RootSecret::<24>::generate(&mut rng);
   let secret_2 = RootSecret::<24>::generate(&mut rng);
   let tagging_key_1 = secret_1.tagging_key(); // give this to a sender
   let tagging_key_2 = secret_2.tagging_key(); // give this to a sender
   // Will validate for detection keys derived from both secret_1 and secret_2 up
   // to n=8
    #[cfg(feature = "entangled")]
   let tag = TaggingKey::generate_entangled_tag(vec![tagging_key_1,tagging_key_2], &mut rng, 8);

Serialization

This crate relies on serde for serialization. FuzzyTags are first compressed into a byte array of 64 bytes + γ bits, padded to the end with zeros to the nearest byte. This representation can then be exchanged using a number of different approaches e.g.:

use fuzzytags::RootSecret;
use fuzzytags::Tag;
use rand::rngs::OsRng;

let mut rng = OsRng;
let secret = RootSecret::<24>::generate(&mut rng);
let tagging_key = secret.tagging_key();

// Give public key to a another party...
// and then they can do...
let tag = tagging_key.generate_tag(&mut rng);

// An example using JSON serialization...see serde doc for other formats:
let serialized_tag = serde_json::to_string(&tag).unwrap();
println!("Serialized: {}", serialized_tag);

// We can then deserialize with:
let deserialized_tag: Result<Tag<24>, serde_json::Error> = serde_json::from_str(&serialized_tag);
println!("Deserialized: {}", deserialized_tag.unwrap());

Benchmarks

We use criterion for benchmarking, and benchmarks can run using cargo bench --bench fuzzy_tags_benches

Results will be in target/criterion/report/index.html.

To benchmark entangled tags run:

cargo bench --features "entangled" --bench entangled

AVX2

This crate has support for the avx2 under the feature simd, to take advantage of this feature it is necessary to build with RUSTFLAGS="-C target_feature=+avx2" e.g.

env RUSTFLAGS="-C target_feature=+avx2" cargo test --release --features "bulk_verify,entangled,simd"

This results in a 40%+ performance improvements on the provided benchmarks.

Credits and Contributions

  • Based on Fuzzy Message Detection by Gabrielle Beck and Julia Len and Ian Miers and Matthew Green
  • Performance & API improvements contributed by Henry de Valence
  • Universal Tag Bug found by Lee Bousfield
  • Fuzzytags Logo by Marcia Díaz Agudelo
  • Thanks to Henry de Valence, George Tankersly, Lee Bousfield and others for helpful discussions.

Integrating FuzzyTags

The properties provided by this system are highly dependent on selecting a false positive rate p. In the following sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger privacy preserving application.

How bad is it to let people select their own false-positive rates?

The short answer is “it depends”.

The longer answer:

When different parties have different false positive rates the server can calculate the skew between a party’s ideal false positive rate and observed false positive rate.

That skew leaks information, especially given certain message distributions. Specifically it leaks parties who receive a larger proportion of system messages than their ideal false positive rate.

i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.

It also removes those messages from the pool of messages that an adversarial server needs to consider for other receivers. Effectively reducing the anonymity set for everyone else.

Which brings us onto:

Differential Attacks

Any kind of differential attacks break this scheme, even for a small number of messages i.e. if you learn (through any means) that a specific set of messages are all likely for 1 party, you can diff them against all other parties keys and very quickly isolate the intended recipient - in simulations of 100-1000 parties it can take as little as 3 messages - even with everyone selecting fairly high false positive rates.

The corollary of the above being that in differential attacks your anonymity set is basically the number of users who download all messages - since you can’t diff them. This has the interesting side effect: the more parties who download everything, the more the system can safely tolerate parties with small false-positive rates.

To what extent you can actually account for this in your application is an open question.

Statistical Attacks

Using some basic binomial probability we can use the false positive rate of reach receiver tag to calculate the probability of matching on at least X tags given the false positive rate. Using this we can find statistically unlikely matches e.g. a low-false positive key matching many tags in a given period.

This can be used to find receivers who likely received messages in a given period.

If it is possible to group tags by sender then we can perform a slightly better attack and ultimately learn the underlying social graph with fairly low false positive rates (in simulations we can learn 5-10% of the underlying connections with between 5-12% false positive rates.)

For more information on statistical attacks please check out our fuzzytags simulator.

Should Senders use an anonymous communication network?

If statistical & differential attacks are likely e.g. few parties download everything and multiple messages are expected to originate from a sender to a receiver or there is other information that might otherwise link a set of messages to a sender or receiver then you may want to consider how to remove that context.

One potential way of removing context is by having senders send their message to the server through some kind of anonymous communication network e.g. a mixnet or tor.

Be warned: This may not eliminate all the context!

How bad is it to select a poor choice of p?

Consider a pareto distribution where most users only receive a few messages, and small subset of users receive a large number of messages it seems that increasing the number of parties is generally more important to overall anonymity of the system than any individual selection of p.

Under a certain threshold of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern.

Assuming we have large number of parties (N), the following heuristic emerges:

  • Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up to a threshold θ, where θ > 2^-N. The lower the value of θ the greater the possibility of random trivial breaks for the party.
  • Parties who expect a large number of messages should choose to receive all messages for 2 reasons:
    1. Even high false positive rates for power users result in information leaks to the server (due to the large skew) i.e. a server can trivially learn what users are power users.
    2. By choosing to receive all messages, power users don’t sacrifice much in terms of bandwidth, but will provide cover for parties who receive a small number of messages and who want a lower false-positive rate.

(We consider a pareto distribution here because we expect many applications to have parties that can be modelled as such - especially over short-time horizons)

Structs

A collection of “secret” data that can be used to determine if a FuzzyTag was intended for the derived tagging key with probability p

The complete secret. Can’t directly be used for testing. Instead you will need to generate a DetectionKey using extract_detection_key

A tag is a probabilistic cryptographic structure. When constructed for a given TaggingKey it will pass the DetectionKey::test_tag 100% of the time. For other tagging keys it will pass the test with probability GAMMA related to the security parameter of the system. This system provides the following security properties:

A public identity that others can create tags for.