Inside X's 'For You' Recommendation Engine: A Modern Blueprint for Feeds

X recently open-sourced the core system behind its “For You” feed, offering a rare, end-to-end look at how a large-scale social recommendation engine actually works in production.

What stands out is not just the ML model, but the system architecture discipline around it.

Below is a high-level walkthrough of the design and the engineering principles behind it.

1️⃣ One Unified Feed: In-Network + Out-of-Network

The feed blends two fundamentally different content sources:

In-network (Thunder) Posts from accounts you follow, served from an in-memory system for sub-millisecond access.
Out-of-network (Phoenix Retrieval) ML-discovered posts retrieved from the global corpus via embedding similarity.

Both flows converge early and are ranked together — no separate heuristics-heavy pipelines.

2️⃣ Home Mixer as the Orchestration Layer

The Home Mixer is the control plane of the feed:

Orchestrates retrieval, hydration, filtering, scoring, and selection
Exposes a gRPC service returning ranked posts
Built on a composable candidate pipeline abstraction

This keeps business logic decoupled from execution, monitoring, and parallelism.

3️⃣ Candidate Pipeline as a First-Class Framework

The pipeline is defined through explicit roles:

Source → retrieve candidates
Hydrator → enrich data
Filter → enforce eligibility
Scorer → compute relevance
Selector → choose top K
SideEffect → async caching / logging

This mirrors modern stream-processing design, but applied to online recommendation.

4️⃣ Thunder: Real-Time In-Network Serving

Thunder acts as a real-time post cache:

Consumes Kafka events for post creation/deletion
Maintains per-user post timelines in memory
Eliminates DB calls for followed-account content
Auto-trims by retention window

This is classic latency-first engineering.

5️⃣ Phoenix Retrieval: Two-Tower Discovery

Out-of-network discovery uses a two-tower model:

User tower → encodes engagement history
Candidate tower → encodes posts
Dot-product similarity retrieves top-K candidates

This ensures scalable global discovery without brute-force ranking.

6️⃣ Phoenix Ranking: Grok-Based Transformer

Ranking is powered by a Grok-derived transformer that:

Takes user engagement history + candidate post
Uses candidate isolation (no cross-candidate attention)
Predicts probabilities for many actions, not a single score

This design makes scores:

Stable
Cacheable
Independent of batch composition

7️⃣ Multi-Action Optimization (Not Just “Relevance”)

Instead of predicting “relevance,” the model predicts:

Likes, replies, reposts, clicks
Video views, dwell time
Negative actions (mute, block, report)

A weighted scorer combines these into a final ranking score, balancing engagement with safety and satisfaction.

8️⃣ Filtering Is Multi-Stage and Explicit

Filtering is not an afterthought:

Pre-scoring filters remove duplicates, muted content, blocked authors, old posts
Post-selection filters enforce safety, visibility, and conversation deduplication

This separation keeps ML focused on relevance, not policy.

9️⃣ Minimal Heuristics, Maximum Learning

One of the boldest choices:

No hand-engineered relevance features

The transformer learns directly from engagement sequences. This dramatically simplifies feature pipelines and reduces long-term system complexity.

🔟 The Bigger Takeaway

X’s recommendation system shows a clear direction for modern feeds:

Strong separation of concerns
ML as the brain, pipelines as the nervous system
Explicit contracts between retrieval, ranking, and policy
Systems thinking equal to model sophistication

This is not “just a model release.” It’s a reference architecture for large-scale, real-time recommendation systems.

If you’re designing feeds, discovery systems, or ML-powered ranking at scale, this repo is worth studying — not for copy-paste code, but for the engineering decisions behind it.