Design Facebook

News feed

Posted by CodingMrWang on February 6, 2022

This post is first created by CodingMrWang, 作者 @Zexian Wang ,please keep the original link if you want to repost it.

What is Facebook’s newsfeed?

A newsfeed is the constantly updating a list of stories in the middle of facebook’s homepage. It includes status updates, photos, videos, links, app activity, and likes from people, pages and groups that a user follows on facebook.

Requirements

Functional requirements

  • Register/Login
  • User can post new feeds
  • User can view timeline/news feed
  • User can follow and unfollow a user

Non Functional requirements

  • Low latency
  • Scability
  • Reliability: user feeds data cannot loss

    Estimation

    Traffic

    150M DAU 60 reads per day 1 write per day

    data size

    1KB per feed

    QPS

    read: 150M * 60/86400 = 100k/s write: 150M/86400=2k/s

    Storage

    post per year: 150 * 365 * 1kb = 60TB

API

getUserFeed(user_id, since_id, max_id, count) user_id: id of user for whom the system will generate the newsfeed. sicne_id: return results with an ID higher than the ID count: number of feeds want to retrieve max_id: return results with an ID less than the id return: Json containing a list of feed items

Service

User Service: register, login Post Service: post a feed, timeline Media Service: upload image, upload vedio FriendShip Service: follow, unfollow.

Diagram

Storage

  • Relational database: MySQL
    • User table (not change frequently)
  • NoSQL
    • Feeds
    • Social graph
  • DFS
    • Socail media (CDN)
  • Cache
    • cache hot feeds

Database

User Table (Postgre)

id int
username varchar
email varchar

Friendship Table (dynamoDB)

from_user_id
to_user_id
created_at

Post table (HBase)

id
user_id
content
create_at
media_id

Media service (S3)

System design

  • Pull model
    • when user check timeline, retrieve 100 news feed from all friends and do k sorted array merge
    • N times read
    • getNewsFeed
      • followings = DB.getFollowings(user = request.user)
      • for follow in followings:
        • posts = DB.getTweets(follow.to_user, 100)
        • news_feeds.merge(posts)
        • return news_feed[:100]
    • cons: at the beginning, long time to load timeline, especially when has many people to follow.
  • Push Model
    • Create a list for each user to store news feed
    • when a user post a new post, it will be pushed to all followers news feed list
    • read = 1DB read
    • write = N DB writes (N is number of followers)
    • benefits; this can be done asynchronizely
    • cons: extra storage needed. For user has a lot of followers, a lot of writes.
  • Pull model + Push model
    • for users which do not have a lot of followers, push post_id to follower’s list, for users have a lot of followers, follower need to pull feeds from db.

Improvement

Cache For pull model

  • cache each user’s timeline
  • cache each user’s follow
  • cache 100 feeds, return first 20 feeds

Push model

  • cache timeline for each user 1000 * 8B * 100M = 800 GB

Data Partitioning

Share feed data by user id. If we use push model, we can shard timeline in cache by user id and shard post by post id.