Crate activitypub_federation

Expand description

A high-level framework for ActivityPub federation in Rust. The goal is to encapsulate all basic functionality, so that developers can easily use the protocol without any prior knowledge.

The ActivityPub protocol is a decentralized social networking protocol. It allows web servers to exchange data using JSON over HTTP. Data can be fetched on demand, and also delivered directly to inboxes for live updates.

While Activitypub is not in widespread use yet, is has the potential to form the basis of the next generation of social media. This is because it has a number of major advantages compared to existing platforms and alternative technologies:

Interoperability: Imagine being able to comment under a Youtube video directly from twitter.com, and having the comment shown under the video on youtube.com. Or following a Subreddit from Facebook. Such functionality is already available on the equivalent Fediverse platforms, thanks to common usage of Activitypub.
Ease of use: From a user perspective, decentralized social media works almost identically to existing websites: a website with email and password based login. Unlike pure peer-to-peer networks, it is not necessary to handle private keys or install any local software.
Open ecosystem: All existing Fediverse software is open source, and there are no legal or bureaucratic requirements to start federating. That means anyone can create or fork federated software. In this way different software platforms can exist in the same network according to the preferences of different user groups. It is not necessary to target the lowest common denominator as with corporate social media.
Censorship resistance: Current social media platforms are under the control of a few corporations and are actively being censored as revealed by the Twitter Files. This would be much more difficult on a federated network, as it would require the cooperation of every single instance administrator. Additionally, users who are affected by censorship can create their own websites and stay connected with the network.
Low barrier to entry: All it takes to host a federated website are a small server, a domain and a TLS certificate. All of this is easily in the reach of individual hobbyists. There is also some technical knowledge needed, but this can be avoided with managed hosting platforms.

Below you can find a complete guide that explains how to create a federated project from scratch.

Feel free to open an issue if you have any questions regarding this crate. You can also join the Matrix channel #activitystreams for discussion about Activitypub in Rust. Additionally check out Socialhub forum for general ActivityPub development.

§Overview

It is recommended to read the W3C Activitypub standard document which explains in detail how the protocol works. Note that it includes a section about client to server interactions, this functionality is not implemented by any major Fediverse project. Other relevant standard documents are Activitystreams and Activity Vocabulary. Its a good idea to keep these around as references during development.

This crate provides high level abstractions for the core functionality of Activitypub: fetching, sending and receiving data, as well as handling HTTP signatures. It was built from the experience of developing Lemmy which is the biggest Fediverse project written in Rust. Nevertheless it very generic and appropriate for any type of application wishing to implement the Activitypub protocol.

There are two examples included to see how the library altogether:

local_federation: Creates two instances which run on localhost and federate with each other. This setup is ideal for quick development and well as automated tests.
live_federation: A minimal application which can be deployed on a server and federate with other platforms such as Mastodon. For this it needs run at the root of a (sub)domain which is available over HTTPS. Edit main.rs to configure the server domain and your Fediverse handle. Once started, it will automatically send a message to you and log any incoming messages.

To see how this library is used in production, have a look at the Lemmy federation code.

§Security

This framework does not inherently perform data sanitization upon receiving federated activity data.

Please, never place implicit trust in the security of data received from the Fediverse. Always keep in mind that malicious entities can be easily created through anonymous fediverse handles.

When implementing our crate in your application, ensure to incorporate data sanitization and validation measures before storing the received data in your database and using it in your user interface. This would significantly reduce the risk of malicious data or actions affecting your application’s security and performance.

This framework is designed to simplify your development process, but it’s your responsibility to ensure the security of your application. Always follow best practices for data handling, sanitization, and security.

§Federating users

This library intentionally doesn’t include any predefined data structures for federated data. The reason is that each federated application is different, and needs different data formats. Activitypub also doesn’t define any specific data structures, but provides a few mandatory fields and many which are optional. For this reason it works best to let each application define its own data structures, and take advantage of serde for (de)serialization. This means we don’t use json-ld which Activitypub is based on, but that doesn’t cause any problems in practice.

The first thing we need to federate are users. Its easiest to get started by looking at the data sent by other platforms. Here we fetch an account from Mastodon, ignoring the many optional fields. This curl command is generally very helpful to inspect and debug federated services.

$ curl -H 'Accept: application/activity+json' https://mastodon.social/@LemmyDev | jq
{
    "id": "https://mastodon.social/users/LemmyDev",
    "type": "Person",
    "preferredUsername": "LemmyDev",
    "name": "Lemmy",
    "inbox": "https://mastodon.social/users/LemmyDev/inbox",
    "outbox": "https://mastodon.social/users/LemmyDev/outbox",
    "publicKey": {
        "id": "https://mastodon.social/users/LemmyDev#main-key",
        "owner": "https://mastodon.social/users/LemmyDev",
        "publicKeyPem": "..."
    },
    ...
}

The most important fields are:

id: Unique identifier for this object. At the same time it is the URL where we can fetch the object from
type: The type of this object
preferredUsername: Immutable username which was chosen at signup and is used in URLs as well as in mentions like @LemmyDev@mastodon.social
name: Displayname which can be freely changed at any time
inbox: URL where incoming activities are delivered to, treated in a later section see xx document for a definition of each field
publicKey: Key which is used for HTTP Signatures

Refer to Activity Vocabulary for further details and description of other fields. You can also inspect many other URLs on federated platforms with the given curl command.

Based on this we can define the following minimal struct to (de)serialize a Person with serde.


#[derive(Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct Person {
    id: ObjectId<DbUser>,
    #[serde(rename = "type")]
    kind: PersonType,
    preferred_username: String,
    name: String,
    inbox: Url,
    outbox: Url,
    public_key: PublicKey,
}

ObjectId is a wrapper for Url which helps to fetch data from a remote server, and convert it to DbUser which is the type that’s stored in our local database. It also helps with caching data so that it doesn’t have to be refetched every time.

PersonType is an enum with a single variant Person. It is used to deserialize objects in a typesafe way: If the JSON type value does not match the string Person, deserialization fails. This helps in places where we don’t know the exact data type that is being deserialized, as you will see later.

Besides we also need a second struct to represent the data which gets stored in our local database (for example PostgreSQL). This is necessary because the data format used by SQL is very different from that used by that from Activitypub. It is organized by an integer primary key instead of a link id. Nested structs are complicated to represent and easier if flattened. Some fields like type don’t need to be stored at all. On the other hand, the database contains fields which can’t be federated, such as the private key and a boolean indicating if the item is local or remote.


pub struct DbUser {
    pub id: i32,
    pub name: String,
    pub display_name: String,
    pub password_hash: Option<String>,
    pub email: Option<String>,
    pub federation_id: Url,
    pub inbox: Url,
    pub outbox: Url,
    pub local: bool,
    pub public_key: String,
    pub private_key: Option<String>,
    pub last_refreshed_at: DateTime<Utc>,
}

Field names and other details of this type can be chosen freely according to your requirements. It only matters that the required data is being stored. Its important that this struct doesn’t represent only local users who registered directly on our website, but also remote users that are registered on other instances and federated to us. The local column helps to easily distinguish both. It can also be distinguished from the domain of the federation_id URL, but that would be a much more expensive operation. All users have a public_key, but only local users have a private_key. On the other hand, password_hash and email are only present for local users. inboxandoutbox` URLs need to be stored because each implementation is free to choose its own format for them, so they can’t be regenerated on the fly.

In larger projects it makes sense to split this data in two. One for data relevant to local users (password_hash, email etc.) and one for data that is shared by both local and federated users (federation_id, public_key etc).

Finally we need to implement the traits Object and Actor for DbUser. These traits are used to convert between Person and DbUser types. Object::from_json must store the received object in database, so that it can later be retrieved without network calls using Object::read_from_id. Refer to the documentation for more details.

§Federating posts

We repeat the same steps taken above for users in order to federate our posts.

$ curl -H 'Accept: application/activity+json' https://mastodon.social/@LemmyDev/109790106847504642 | jq
{
    "id": "https://mastodon.social/users/LemmyDev/statuses/109790106847504642",
    "type": "Note",
    "content": "<p><a href=\"https://mastodon.social/tags/lemmy\" ...",
    "attributedTo": "https://mastodon.social/users/LemmyDev",
    "to": [
        "https://www.w3.org/ns/activitystreams#Public"
    ],
    "cc": [
        "https://mastodon.social/users/LemmyDev/followers"
    ],
}

The most important fields are:

id: Unique identifier for this object. At the same time it is the URL where we can fetch the object from
type: The type of this object
content: Post text in HTML format
attributedTo: ID of the user who created this post
to, cc: Who the object is for. The special “public” URL indicates that everyone can view it. It also gets delivered to followers of the LemmyDev account.

Just like for Person before, we need to implement a protocol type and a database type, then implement trait Object. See the example for details.

§Configuration

Next we need to do some configuration. Most importantly we need to specify the domain where the federated instance is running. It should be at the domain root and available over HTTPS for production. See the documentation for a list of config options. The parameter user_data is for anything that your application requires in handler functions, such as database connection handle, configuration etc.

let config = FederationConfig::builder()
    .domain("example.com")
    .app_data(db_connection)
    .build().await?;

debug is necessary to test federation with http and localhost URLs, but it should never be used in production. url_verifier can be used to implement a domain blacklist.

§HTTP endpoints

The next step is to allow other servers to fetch our actors and objects. For this we need to create an HTTP route, most commonly at the same path where the actor or object can be viewed in a web browser. On this path there should be another route which responds to requests with header Accept: application/activity+json and serves the JSON data. This needs to be done for all actors and objects. Note that only local items should be served in this way.


#[tokio::main]
async fn main() -> Result<(), Error> {
    let data = FederationConfig::builder()
        .domain("example.com")
        .app_data(DbConnection)
        .build().await?;
        
    let app = axum::Router::new()
        .route("/user/:name", get(http_get_user))
        .layer(FederationMiddleware::new(data));

    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    tracing::debug!("listening on {}", addr);
    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await?;
    Ok(())
}

async fn http_get_user(
    header_map: HeaderMap,
    Path(name): Path<String>,
    data: Data<DbConnection>,
) -> impl IntoResponse {
    let accept = header_map.get("accept").map(|v| v.to_str().unwrap());
    if accept == Some(FEDERATION_CONTENT_TYPE) {
        let db_user = data.read_local_user(&name).await.unwrap();
        let json_user = db_user.into_json(&data).await.unwrap();
        FederationJson(WithContext::new_default(json_user)).into_response()
    }
    else {
        generate_user_html(name, data).await
    }
}

There are a couple of things going on here. Like before we are constructing the federation config with our domain and application data. We pass this to a middleware to make it available in request handlers, then listening on a port with the axum webserver.

The http_get_user method allows retrieving a user profile from /user/:name. It checks the accept header, and compares it to the one used by Activitypub (application/activity+json). If it matches, the user is read from database and converted to Activitypub json format. The context field is added (WithContext for json-ld compliance), and it is converted to a JSON response with header content-type: application/activity+json using FederationJson. It can now be retrieved with the command curl -H 'Accept: application/activity+json' ... introduced earlier, or with ObjectId.

If the accept header doesn’t match, it renders the user profile as HTML for viewing in a web browser.

We also need to implement a webfinger endpoint, which can resolve a handle like @nutomic@lemmy.ml into an ID like https://lemmy.ml/u/nutomic that can be used by Activitypub. Webfinger is not part of the ActivityPub standard, but the fact that Mastodon requires it makes it de-facto mandatory. It is defined in RFC 7033. Implementing it basically means handling requests of the formhttps://mastodon.social/.well-known/webfinger?resource=acct:LemmyDev@mastodon.social.

To do this we can implement the following HTTP handler which must be bound to path .well-known/webfinger.


#[derive(Deserialize)]
struct WebfingerQuery {
    resource: String,
}

async fn webfinger(
    Query(query): Query<WebfingerQuery>,
    data: Data<DbConnection>,
) -> Result<Json<Webfinger>, Error> {
    let name = extract_webfinger_name(&query.resource, &data)?;
    let db_user = data.read_local_user(name).await?;
    Ok(Json(build_webfinger_response(query.resource, db_user.federation_id)))
}

§Fetching data

After setting up our structs, implementing traits and initializing configuration, we can easily fetch data from remote servers:

let config = FederationConfig::builder()
    .domain("example.com")
    .app_data(db_connection)
    .build().await?;
let user_id = ObjectId::<DbUser>::parse("https://mastodon.social/@LemmyDev")?;
let data = config.to_request_data();
let user = user_id.dereference(&data).await;
assert!(user.is_ok());

dereference retrieves the object JSON at the given URL, and uses serde to convert it to Person. It then calls your method Object::from_json which inserts it in the database and returns a DbUser struct. request_data contains the federation config as well as a counter of outgoing HTTP requests. If this counter exceeds the configured maximum, further requests are aborted in order to avoid recursive fetching which could allow for a denial of service attack.

After dereferencing a remote object, it is stored in the local database and can be retrieved using ObjectId::dereference_local without any network requests. This is important for performance reasons and for searching.

We can similarly dereference a user over webfinger with the following method. It fetches the webfinger response from .well-known/webfinger and then fetches the actor using ObjectId::dereference as above.

let user: DbUser = webfinger_resolve_actor("ruud@lemmy.world", &data).await?;

Note that webfinger queries don’t contain a leading @. It is possible tha there are multiple Activitypub IDs returned for a single webfinger query in case of multiple actors with the same name (for example Lemmy permits group and person with the same name). In this case webfinger_resolve_actor automatically loops and returns the first item which can be dereferenced successfully to the given type.

§Sending and receiving activities

Activitypub propagates actions across servers using Activities. For this each actor has an inbox and a public/private key pair. We already defined a Person actor with keypair. Whats left is to define an activity. This is similar to the way we defined Person and Note structs before. In this case we need to implement the ActivityHandler trait.


#[derive(Deserialize, Serialize, Clone, Debug)]
#[serde(rename_all = "camelCase")]
pub struct Follow {
    pub actor: ObjectId<DbUser>,
    pub object: ObjectId<DbUser>,
    #[serde(rename = "type")]
    pub kind: FollowType,
    pub id: Url,
}

#[async_trait]
impl ActivityHandler for Follow {
    type DataType = DbConnection;
    type Error = Error;

    fn id(&self) -> &Url {
        &self.id
    }

    fn actor(&self) -> &Url {
        self.actor.inner()
    }
    
    async fn verify(&self,  _data: &Data<Self::DataType>) -> Result<(), Self::Error> {
        Ok(())
    }

    async fn receive(self, data: &Data<Self::DataType>) -> Result<(), Self::Error> {
        let actor = self.actor.dereference(data).await?;
        let followed = self.object.dereference(data).await?;
        data.add_follower(followed, actor).await?;
        Ok(())
    }
}

In this case there is no need to convert to a database type, because activities don’t need to be stored in the database in full. Instead we dereference the involved user accounts, and create a follow relation in the database.

Next its time to setup the actual HTTP handler for the inbox. For this we first define an enum of all activities which are accepted by the actor. Then we just need to define an HTTP endpoint at the path of our choice (identical to Person.inbox defined earlier). This endpoint needs to hand received data over to receive_activity. This method verifies the HTTP signature, checks the blocklist with FederationConfigBuilder::url_verifier and more. If everything is valid, the activity is passed to the receive method we defined above.


#[derive(Deserialize, Serialize, Debug)]
#[serde(untagged)]
#[enum_delegate::implement(ActivityHandler)]
pub enum PersonAcceptedActivities {
    Follow(Follow),
}

async fn http_post_user_inbox(
    data: Data<DbConnection>,
    activity_data: ActivityData,
) -> impl IntoResponse {
    receive_activity::<WithContext<PersonAcceptedActivities>, DbUser, DbConnection>(
        activity_data,
        &data,
    )
        .await.unwrap()
}

The PersonAcceptedActivities works by attempting to parse the received JSON data with each variant in order. The first variant which parses without errors is used for receiving. This means you should avoid defining multiple activities in a way that they might conflict and parse the same data.

Activity enums can also be nested.

§Sending activities

To send an activity we need to initialize our previously defined struct, and pick an actor for sending. We also need a list of all actors that should receive the activity.

let activity = Follow {
    actor: ObjectId::parse("https://lemmy.ml/u/nutomic")?,
    object: recipient.federation_id.clone().into(),
    kind: Default::default(),
    id: "https://lemmy.ml/activities/321".try_into()?
};
let inboxes = vec![recipient.shared_inbox_or_inbox()];

queue_activity(&activity, &sender, inboxes, &data).await?;

The list of inboxes gets deduplicated (important for shared inbox). All inboxes on the local domain and those which fail the crate::config::UrlVerifier check are excluded from delivery. For each remaining inbox a background tasks is created. It signs the HTTP header with the given private key. Finally the activity is delivered to the inbox.

It is possible that delivery fails because the target instance is temporarily unreachable. In this case the task is scheduled for retry after a certain waiting time. For each task delivery is retried up to 3 times after the initial attempt. The retry intervals are as follows:

one minute, in case of service restart
one hour, in case of instance maintenance
2.5 days, in case of major incident with rebuild from backup

In case crate::config::FederationConfigBuilder::debug is enabled, no background thread is used but activities are sent directly on the foreground. This makes it easier to catch delivery errors and avoids complicated steps to await delivery in tests.

In some cases you may want to bypass the builtin activity queue, and implement your own. For example to specify different retry intervals, or to persist retries across application restarts. You can do it with the following code:

let activity = Follow {
    actor: ObjectId::parse("https://lemmy.ml/u/nutomic")?,
    object: recipient.federation_id.clone().into(),
    kind: Default::default(),
    id: "https://lemmy.ml/activities/321".try_into()?
};
let inboxes = vec![recipient.shared_inbox_or_inbox()];

let sends = SendActivityTask::prepare(&activity, &sender, inboxes, &data).await?;
for send in sends {
send.sign_and_send(&data).await?;
}

§Fetching remote object with unknown type

It is sometimes necessary to fetch from a URL, but we don’t know the exact type of object it will return. An example is the search field in most federated platforms, which allows pasting and id URL and fetches it from the origin server. It can be implemented in the following way:


#[derive(Debug)]
pub enum SearchableDbObjects {
    User(DbUser),
    Post(DbPost)
}

#[derive(Deserialize, Serialize, Debug)]
#[serde(untagged)]
pub enum SearchableObjects {
    Person(Person),
    Note(Note)
}

#[async_trait::async_trait]
impl Object for SearchableDbObjects {
    type DataType = DbConnection;
    type Kind = SearchableObjects;
    type Error = anyhow::Error;

    async fn read_from_id(
        object_id: Url,
        data: &Data<Self::DataType>,
    ) -> Result<Option<Self>, Self::Error> {
        Ok(None)
    }

    async fn into_json(
        self,
        data: &Data<Self::DataType>,
    ) -> Result<Self::Kind, Self::Error> {
        unimplemented!();
    }
    
    async fn verify(json: &Self::Kind, expected_domain: &Url, _data: &Data<Self::DataType>) -> Result<(), Self::Error> {
        Ok(())
    }

    async fn from_json(
        json: Self::Kind,
        data: &Data<Self::DataType>,
    ) -> Result<Self, Self::Error> {
        use SearchableDbObjects::*;
        match json {
            SearchableObjects::Person(p) => Ok(User(DbUser::from_json(p, data).await?)),
            SearchableObjects::Note(n) => Ok(Post(DbPost::from_json(n, data).await?)),
        }
    }
}

#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    let query = "https://example.com/id/413";
    let query_result = ObjectId::<SearchableDbObjects>::parse(query)?
        .dereference(&data)
        .await?;
    match query_result {
        SearchableDbObjects::Post(post) => {} // retrieved object is a post
        SearchableDbObjects::User(user) => {} // object is a user
    };
    Ok(())
}

This is similar to the way receiving activities are handled in the previous section. The remote JSON is fetched, and received using the first enum variant which can successfully deserialize the data.

Re-exports§

pub use activitystreams_kinds as kinds;

Modules§

activity_queue
Queue for signing and sending outgoing activities with retry
activity_sending
Queue for signing and sending outgoing activities with retry
actix_web
Utilities for using this library with actix-web framework
axum
Utilities for using this library with axum web framework
config
Configuration for this library, with various federation settings
error
Error messages returned by this library
fetch
Utilities for fetching data from other servers
http_signatures
Generating keypairs, creating and verifying signatures
protocol
Data structures which help to define federated messages
traits
Traits which need to be implemented for federated data types

Constants§

FEDERATION_CONTENT_TYPE
Mime type for Activitypub data, used for Accept and Content-Type HTTP headers