Proximity Server/Yelp - System Design

First created 2021年01月17日

A system with Read heavy/Write little

 

placeId 8 bytes

name 256 bytes

latitude 4 bytes

longtitude 4 bytes

category 1 byte

description 512 bytes

 

Functions

Function1(Core): Given a location, a user can search all the nearby places within a give radius.

Function2: A user can add his/her favourite places.

Function3: A user can post feedback/review to a place: rating(required) + text/pictures(optional).

 

Function1(Core): GET /search

parameters: keyword, radius, category(optional), max_return(optional), (optional)sortOrder, (optional)filter, (optional development)next_token

return: A JSON containing information about a list of popular places matching the search query. Each result entry will have the place name, address, category, rating, and image. (Copied from reference link)

 

Function2: POST/add  DELETE/remove (to/from favorite list) 

parameters: placeId

 

Function3: POST/add 

parameters: placeId, rating, review text(optional), photo_urls(optional)

 

## QuadTree

### Introduction

A tree in which each node has 4 children nodes. Starting from thre root node which represents the whole world, we keep spliting each child node until there are no nodes left with more than 500 locations.

All information of places are stored in leaf nodes. (A node represents a grid with no more than 500 places).

With double linked pointer to other leaf nodes and parent pointer to parent node, we can find neighboring grids of a given grid.

### Workflow

First we find the node that contains the user’s location.

If that node has enough desired places, we can return them to the user.

If not, we will keep expanding to the neighboring nodes (either through the parent pointers or doubly linked list) until either we find enough required number of places within the maximum radius.

Storage 24 Bytes x 500M = 12 GB (potentially)

 

Schema

Business Profile DB

placeId (Primary Key) | placeName | Address | Category  | Description | [ReviewInfo(ReviewText, [mediaId1,2,3,...])]

Business Media DB

mediaId (Primary Key) | placeId | mediaURL | userId

Review DB

reviewId (Primary Key) | placeId | text | userId

 

Partition

1. By region or zipcode

This could cause the issue of hot places. To avoid one of the server in the cluster receive too many requests.

2. By locationId

This could make a request querying too many shard servers. May not be efficient.

3. By Geohash/Google S2

To avoid the issue of hot shard, try to make that cells are continuous in one shard and adjust the number of cells in one shard by the number of places.

 

Cache

in memory service

Between client and application servers

Between application servers and backend servers

 

Load Balance

Round Robin (distributed equally)

More intelligent: also take traffic/load/server status as consideration by periodically querying backend server then adjusting their weights

 

CDN(Other)

A CDN is a system of globally distributed servers that deliver web content to a user based on the geographic locations of the user, the origin of the web page and a content delivery server. CDNs replicate content in multiple places. User can get the content from the nearest CDN.
Push CDNs/Pull CDN
Disadvantage: expensive/read stale data
 

Reference

[1] https://medium.com/swlh/design-a-proximity-server-like-nearby-or-yelp-part-1-c8fe2951c534

[2] https://codeburst.io/design-a-proximity-server-like-yelp-part-2-d430879203a5

[3] https://www.educative.io/courses/grokking-the-system-design-interview/B8rpM8E16LQ

posted @ 2021-01-18 16:08  YBgnAW  阅读(386)  评论(0)    收藏  举报