X recently open-sourced the core system behind its “For You” feed, offering a rare, end-to-end look at how a large-scale social recommendation engine actually works in production.
What stands out is not just the ML model, but the system architecture discipline around it.
Below is a high-level walkthrough of the design and the engineering principles behind it.
1️⃣ One Unified Feed: In-Network + Out-of-Network
The feed blends two fundamentally different content sources:
- In-network (Thunder) Posts from accounts you follow, served from an in-memory system for sub-millisecond access.
- Out-of-network (Phoenix Retrieval) ML-discovered posts retrieved from the global corpus via embedding similarity.
Both flows converge early and are ranked together — no separate heuristics-heavy pipelines.
2️⃣ Home Mixer as the Orchestration Layer
The Home Mixer is the control plane of the feed:
- Orchestrates retrieval, hydration, filtering, scoring, and selection
- Exposes a gRPC service returning ranked posts
- Built on a composable candidate pipeline abstraction
This keeps business logic decoupled from execution, monitoring, and parallelism.
3️⃣ Candidate Pipeline as a First-Class Framework
The pipeline is defined through explicit roles:
- Source → retrieve candidates
- Hydrator → enrich data
- Filter → enforce eligibility
- Scorer → compute relevance
- Selector → choose top K
- SideEffect → async caching / logging
This mirrors modern stream-processing design, but applied to online recommendation.
4️⃣ Thunder: Real-Time In-Network Serving
Thunder acts as a real-time post cache:
- Consumes Kafka events for post creation/deletion
- Maintains per-user post timelines in memory
- Eliminates DB calls for followed-account content
- Auto-trims by retention window
This is classic latency-first engineering.
5️⃣ Phoenix Retrieval: Two-Tower Discovery
Out-of-network discovery uses a two-tower model:
- User tower → encodes engagement history
- Candidate tower → encodes posts
- Dot-product similarity retrieves top-K candidates
This ensures scalable global discovery without brute-force ranking.
6️⃣ Phoenix Ranking: Grok-Based Transformer
Ranking is powered by a Grok-derived transformer that:
- Takes user engagement history + candidate post
- Uses candidate isolation (no cross-candidate attention)
- Predicts probabilities for many actions, not a single score
This design makes scores:
- Stable
- Cacheable
- Independent of batch composition
7️⃣ Multi-Action Optimization (Not Just “Relevance”)
Instead of predicting “relevance,” the model predicts:
- Likes, replies, reposts, clicks
- Video views, dwell time
- Negative actions (mute, block, report)
A weighted scorer combines these into a final ranking score, balancing engagement with safety and satisfaction.
8️⃣ Filtering Is Multi-Stage and Explicit
Filtering is not an afterthought:
- Pre-scoring filters remove duplicates, muted content, blocked authors, old posts
- Post-selection filters enforce safety, visibility, and conversation deduplication
This separation keeps ML focused on relevance, not policy.
9️⃣ Minimal Heuristics, Maximum Learning
One of the boldest choices:
No hand-engineered relevance features
The transformer learns directly from engagement sequences. This dramatically simplifies feature pipelines and reduces long-term system complexity.
🔟 The Bigger Takeaway
X’s recommendation system shows a clear direction for modern feeds:
- Strong separation of concerns
- ML as the brain, pipelines as the nervous system
- Explicit contracts between retrieval, ranking, and policy
- Systems thinking equal to model sophistication
This is not “just a model release.” It’s a reference architecture for large-scale, real-time recommendation systems.
If you’re designing feeds, discovery systems, or ML-powered ranking at scale, this repo is worth studying — not for copy-paste code, but for the engineering decisions behind it.