# X For You Feed Algorithm
This repository contains the core recommendation system powering the "For You" feed on X. It combines in-network content (from accounts you follow) with out-of-network content (discovered through ML-based retrieval) and ranks everything using a Grok-based transformer model.
> **Note:** The transformer implementation is ported from the [Grok-1 open source release](https://github.com/xai-org/grok-1) by xAI, adapted for recommendation system use cases.
## Table of Contents
- [Overview](#overview)
- [System Architecture](#system-architecture)
- [Components](#components)
- [Home Mixer](#home-mixer)
- [Thunder](#thunder)
- [Phoenix](#phoenix)
- [Candidate Pipeline](#candidate-pipeline)
- [How It Works](#how-it-works)
- [Pipeline Stages](#pipeline-stages)
- [Scoring and Ranking](#scoring-and-ranking)
- [Filtering](#filtering)
- [Key Design Decisions](#key-design-decisions)
- [License](#license)
---
## Overview
The For You feed algorithm retrieves, ranks, and filters posts from two sources:
1. **In-Network (Thunder)**: Posts from accounts you follow
2. **Out-of-Network (Phoenix Retrieval)**: Posts discovered from a global corpus
Both sources are combined and ranked together using **Phoenix**, a Grok-based transformer model that predicts engagement probabilities for each post. The final score is a weighted combination of these predicted engagements.
We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting by understanding your engagement history (what you liked, replied to, shared, etc.) and using that to determine what content is relevant to you.
---
## System Architecture
```
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ FOR YOU FEED REQUEST │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ HOME MIXER │
│ (Orchestration Layer) │
├─────────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ QUERY HYDRATION │ │
│ │ ┌──────────────────────────┐ ┌──────────────────────────────────────────────┐ │ │
│ │ │ User Action Sequence │ │ User Features │ │ │
│ │ │ (engagement history) │ │ (following list, preferences, etc.) │ │ │
│ │ └──────────────────────────┘ └──────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ CANDIDATE SOURCES │ │
│ │ ┌─────────────────────────────┐ ┌────────────────────────────────┐ │ │
│ │ │ THUNDER │ │ PHOENIX RETRIEVAL │ │ │
│ │ │ (In-Network Posts) │ │ (Out-of-Network Posts) │ │ │
│ │ │ │ │ │ │ │
│ │ │ Posts from accounts │ │ ML-based similarity search │ │ │
│ │ │ you follow │ │ across global corpus │ │ │
│ │ └─────────────────────────────┘ └────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ HYDRATION │ │
│ │ Fetch additional data: core post metadata, author info, media entities, etc. │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ FILTERING │ │
│ │ Remove: duplicates, old posts, self-posts, blocked authors, muted keywords, etc. │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ SCORING │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Phoenix Scorer │ Grok-based Transformer predicts: │ │
│ │ │ (ML Predictions) │ P(like), P(reply), P(repost), P(click)... │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Weighted Scorer │ Weighted Score = Σ (weight × P(action)) │ │
│ │ │ (Combine predictions) │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ Author Diversity │ Attenuate repeated author scores │ │
│ │ │ Scorer │ to ensure feed diversity │ │
│ │ └──────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ SELECTION │ │
│ │ Sort by final score, select top K candidates │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ FILTERING (Post-Selection) │ │
│ │ Visibility filtering (deleted/spam/violence/gore etc) │ │
│ └─────────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ RANKED FEED RESPONSE │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
```
---
## Components
### Home Mixer
**Location:** [`home-mixer/`](home-mixer/)
The orchestration layer that assembles the For You feed. It leverages the `CandidatePipeline` framework with the following stages:
| Query Hydrators | Fetch user context (engagement history, following list) |
| Sources | Retrieve candidates from Thunder and Phoenix |
| Hydrators | Enrich candidates with additional data |
| Filters | Remove ineligible candidates |
| Scorers | Predict engagement and compute final scores |
| Selector | Sort by score and select top K |
| Post-Selection Filters | Final visibility and dedup checks |
| Side Effects | Cache request info for future use |
The server exposes a gRPC endpoint (`ScoredPostsService`) that returns ranked posts for a given user.
---
### Thunder
**Location:** [`thunder/`](thunder/)
An in-memory post store and realtime ingestion pipeline that tracks recent posts from all users. It:
- Consumes post create/delete events from Kafka
- Maintains per-user stores for original posts, replies/reposts, and video posts
- Serves "in-network" post candidates from accounts the requesting user follows
- Automatically trims posts older than the retention period
Thunder enables sub-millisecond lookups for in-network content without hitting an external database.
---
### Phoenix
**Location:** [`phoenix/`](phoenix/)
The ML component with two main functions:
#### 1. Retrieval (Two-Tower Model)
Finds relevant out-of-network posts:
- **User Tower**: Encodes user features and engagement history into an embedding
- **Candidate Tower**: Encodes all posts into embeddings
- **Similarity Search**: Retrieves top-K posts via dot product similarity
#### 2. Ranking (Transformer with Candidate Isolation)
Predicts engagement probabilities for each candidate:
- Takes user context (engagement history) and candidate posts as input
- Uses special attention masking so candidates cannot attend to each other
- Outputs probabilities for each action type (like, reply, repost, click, etc.)
See [`phoenix/README.md`](phoenix/README.md) for detailed architecture documentation.
---
### Candidate Pipeline
**Location:** [`candidate-pipeline/`](candidate-pipeline/)
A reusable framework for building recommendation pipelines. Defines traits for:
| `Source` | Fetch candidates from a data source |
| `Hydrator` | Enrich candidates with additional features |
| `Filter` | Remove candidates that shouldn't be shown |
| `Scorer` | Compute scores for ranking |
| `Selector` | Sort and select top candidates |
| `SideEffect` | Run async side effects (caching, logging) |
The framework runs sources and hydrators in parallel where possible, with configurable error handling and logging.
---
## How It Works
### Pipeline Stages
1. **Query Hydration**: Fetch the user's recent engagements history and metadata (eg. following list)
2. **Candidate Sourcing**: Retrieve candidates from:
- **Thunder**: Recent posts from followed accounts (in-network)
- **Phoenix Retrieval**: ML-discovered posts from the global corpus (out-of-network)
3. **Candidate Hydration**: Enrich candidates with:
- Core post data (text, media, etc.)
- Author information (username, verification status)
- Video duration (for video posts)
- Subscription status
4. **Pre-Scoring Filters**: Remove posts that are:
- Duplicates
- Too old
- From the viewer themselves
- From blocked/muted accounts
- Containing muted keywords
- Previously seen or recently served
- Ineligible subscription content
5. **Scoring**: Apply multiple scorers sequentially:
- **Phoenix Scorer**: Get ML predictions from the Phoenix transformer model
- **Weighted Scorer**: Combine predictions into a final relevance score
- **Author Diversity Scorer**: Attenuate repeated author scores for diversity
- **OON Scorer**: Adjust scores for out-of-network content
6. **Selection**: Sort by score and select the top K candidates
7. **Post-Selection Processing**: Final validation of post candidates to be served
---
### Scoring and Ranking
The Phoenix Grok-based transformer model predicts probabilities for multiple engagement types:
```
Predictions:
├── P(favorite)
├── P(reply)
├── P(repost)
├── P(quote)
├── P(click)
├── P(profile_click)
├── P(video_view)
├── P(photo_expand)
├── P(share)
├── P(dwell)
├── P(follow_author)
├── P(not_interested)
├── P(block_author)
├── P(mute_author)
└── P(report)
```
The **Weighted Scorer** combines these into a final score:
```
Final Score = Σ (weight_i × P(action_i))
```
Positive actions (like, repost, share) have positive weights. Negative actions (block, mute, report) have negative weights, pushing down content the user would likely dislike.
---
### Filtering
Filters run at two stages:
**Pre-Scoring Filters:**
| `DropDuplicatesFilter` | Remove duplicate post IDs |
| `CoreDataHydrationFilter` | Remove posts that failed to hydrate core metadata |
| `AgeFilter` | Remove posts older than threshold |
| `SelfpostFilter` | Remove user's own posts |
| `RepostDeduplicationFilter` | Dedupe reposts of same content |
| `IneligibleSubscriptionFilter` | Remove paywalled content user can't access |
| `PreviouslySeenPostsFilter` | Remove posts user has already seen |
| `PreviouslyServedPostsFilter` | Remove posts already served in session |
| `MutedKeywordFilter` | Remove posts with user's muted keywords |
| `AuthorSocialgraphFilter` | Remove posts from blocked/muted authors |
**Post-Selection Filters:**
| `VFFilter` | Remove posts that are deleted/spam/violence/gore etc. |
| `DedupConversationFilter` | Deduplicate multiple branches of the same conversation thread |
---
## Key Design Decisions
### 1. No Hand-Engineered Features
The system relies entirely on the Grok-based transformer to learn relevance from user engagement sequences. No manual feature engineering for content relevance. This significantly reduces the complexity in our data pipelines and serving infrastructure.
### 2. Candidate Isolation in Ranking
During transformer inference, candidates cannot attend to each other—only to the user context. This ensures the score for a post doesn't depend on which other posts are in the batch, making scores consistent and cacheable.
### 3. Hash-Based Embeddings
Both retrieval and ranking use multiple hash functions for embedding lookup
### 4. Multi-Action Prediction
Rather than predicting a single "relevance" score, the model predicts probabilities for many actions.
### 5. Composable Pipeline Architecture
The `candidate-pipeline` crate provides a flexible framework for building recommendation pipelines with:
- Separation of pipeline execution and monitoring from business logic
- Parallel execution of independent stages and graceful error handling
- Easy addition of new sources, hydrations, filters, and scorers
---
## License
This project is licensed under the Apache License 2.0. See [LICENSE](LICENSE) for details.