This post is first created by CodingMrWang, 作者 @Zexian Wang ,please keep the original link if you want to repost it.
What is Facebook’s newsfeed?
A newsfeed is the constantly updating a list of stories in the middle of facebook’s homepage. It includes status updates, photos, videos, links, app activity, and likes from people, pages and groups that a user follows on facebook.
Requirements
Functional requirements
- Register/Login
- User can post new feeds
- User can view timeline/news feed
- User can follow and unfollow a user
Non Functional requirements
- Low latency
- Scability
- Reliability: user feeds data cannot loss
Estimation
Traffic
150M DAU 60 reads per day 1 write per day
data size
1KB per feed
QPS
read: 150M * 60/86400 = 100k/s write: 150M/86400=2k/s
Storage
post per year: 150 * 365 * 1kb = 60TB
API
getUserFeed(user_id, since_id, max_id, count) user_id: id of user for whom the system will generate the newsfeed. sicne_id: return results with an ID higher than the ID count: number of feeds want to retrieve max_id: return results with an ID less than the id return: Json containing a list of feed items
Service
User Service: register, login Post Service: post a feed, timeline Media Service: upload image, upload vedio FriendShip Service: follow, unfollow.
Diagram
Storage
- Relational database: MySQL
- User table (not change frequently)
- NoSQL
- Feeds
- Social graph
- DFS
- Socail media (CDN)
- Cache
- cache hot feeds
Database
User Table (Postgre)
id | int |
---|---|
username | varchar |
varchar |
Friendship Table (dynamoDB)
from_user_id |
---|
to_user_id |
created_at |
Post table (HBase)
id |
---|
user_id |
content |
create_at |
media_id |
Media service (S3)
System design
- Pull model
- when user check timeline, retrieve 100 news feed from all friends and do k sorted array merge
- N times read
- getNewsFeed
- followings = DB.getFollowings(user = request.user)
- for follow in followings:
- posts = DB.getTweets(follow.to_user, 100)
- news_feeds.merge(posts)
- return news_feed[:100]
- cons: at the beginning, long time to load timeline, especially when has many people to follow.
- Push Model
- Create a list for each user to store news feed
- when a user post a new post, it will be pushed to all followers news feed list
- read = 1DB read
- write = N DB writes (N is number of followers)
- benefits; this can be done asynchronizely
- cons: extra storage needed. For user has a lot of followers, a lot of writes.
- Pull model + Push model
- for users which do not have a lot of followers, push post_id to follower’s list, for users have a lot of followers, follower need to pull feeds from db.
Improvement
Cache For pull model
- cache each user’s timeline
- cache each user’s follow
- cache 100 feeds, return first 20 feeds
Push model
- cache timeline for each user 1000 * 8B * 100M = 800 GB
Data Partitioning
Share feed data by user id. If we use push model, we can shard timeline in cache by user id and shard post by post id.