X recently open-sourced the core system behind its “For You” feed, offering a rare, end-to-end look at how a large-scale social recommendation engine actually works in production.

What stands out is not just the ML model, but the system architecture discipline around it.

Below is a high-level walkthrough of the design and the engineering principles behind it.

1️⃣ One Unified Feed: In-Network + Out-of-Network

The feed blends two fundamentally different content sources:

  • In-network (Thunder) Posts from accounts you follow, served from an in-memory system for sub-millisecond access.
  • Out-of-network (Phoenix Retrieval) ML-discovered posts retrieved from the global corpus via embedding similarity.

Both flows converge early and are ranked together — no separate heuristics-heavy pipelines.

2️⃣ Home Mixer as the Orchestration Layer

The Home Mixer is the control plane of the feed:

  • Orchestrates retrieval, hydration, filtering, scoring, and selection
  • Exposes a gRPC service returning ranked posts
  • Built on a composable candidate pipeline abstraction

This keeps business logic decoupled from execution, monitoring, and parallelism.

3️⃣ Candidate Pipeline as a First-Class Framework

The pipeline is defined through explicit roles:

  • Source → retrieve candidates
  • Hydrator → enrich data
  • Filter → enforce eligibility
  • Scorer → compute relevance
  • Selector → choose top K
  • SideEffect → async caching / logging

This mirrors modern stream-processing design, but applied to online recommendation.

4️⃣ Thunder: Real-Time In-Network Serving

Thunder acts as a real-time post cache:

  • Consumes Kafka events for post creation/deletion
  • Maintains per-user post timelines in memory
  • Eliminates DB calls for followed-account content
  • Auto-trims by retention window

This is classic latency-first engineering.

5️⃣ Phoenix Retrieval: Two-Tower Discovery

Out-of-network discovery uses a two-tower model:

  • User tower → encodes engagement history
  • Candidate tower → encodes posts
  • Dot-product similarity retrieves top-K candidates

This ensures scalable global discovery without brute-force ranking.

6️⃣ Phoenix Ranking: Grok-Based Transformer

Ranking is powered by a Grok-derived transformer that:

  • Takes user engagement history + candidate post
  • Uses candidate isolation (no cross-candidate attention)
  • Predicts probabilities for many actions, not a single score

This design makes scores:

  • Stable
  • Cacheable
  • Independent of batch composition

7️⃣ Multi-Action Optimization (Not Just “Relevance”)

Instead of predicting “relevance,” the model predicts:

  • Likes, replies, reposts, clicks
  • Video views, dwell time
  • Negative actions (mute, block, report)

A weighted scorer combines these into a final ranking score, balancing engagement with safety and satisfaction.

8️⃣ Filtering Is Multi-Stage and Explicit

Filtering is not an afterthought:

  • Pre-scoring filters remove duplicates, muted content, blocked authors, old posts
  • Post-selection filters enforce safety, visibility, and conversation deduplication

This separation keeps ML focused on relevance, not policy.

9️⃣ Minimal Heuristics, Maximum Learning

One of the boldest choices:

No hand-engineered relevance features

The transformer learns directly from engagement sequences. This dramatically simplifies feature pipelines and reduces long-term system complexity.

🔟 The Bigger Takeaway

X’s recommendation system shows a clear direction for modern feeds:

  • Strong separation of concerns
  • ML as the brain, pipelines as the nervous system
  • Explicit contracts between retrieval, ranking, and policy
  • Systems thinking equal to model sophistication

This is not “just a model release.” It’s a reference architecture for large-scale, real-time recommendation systems.

If you’re designing feeds, discovery systems, or ML-powered ranking at scale, this repo is worth studying — not for copy-paste code, but for the engineering decisions behind it.