TinyURL - System Design

CREATED 2021/09/15 22:15 

Design Tiny URL System

Functional requirements

1. URL shortening: given a long url, return a short url

2. URL redirecting: given a short url, redirect to the original long url

3. The user can pick up a special alias.

4. The user can define the specific expiry time for a short url. Or should we add expiry time for shortURL?

5. Can one longURL have multiple/several shortURLs? Will the system return two different short URLs when given the same long URL? Based on if we need user authentication. To be discussed.

 

Non-Functional requirements

High availability with fault tolerance, scalability in service.

Real-time with minimum latency in redirection.

Shortened URLs are not guessable or predictable.

 

Estimations

Write Bandwidth : 100KB/s

Read Bandwidth : 10MB/s

 

APIs

POST api/v1/data/shorten

  request parameters{

    string : longURL

    long : expiry time (optional)

    string : special alias (optional)

  } -> {shortURL}

     201 Created

         Return a shortURL

GET api/v1/getUrl {shortURL} -> {Return longURL for HTTP redirection}

         301(permanetly and the browser caches the response) 302(temporarily redirecting)

 

Read/Write ratio 100:1. This is a read-heavy system.

 

Store/DB

hash table <shortURL(6 Byte per shortURL), longURL, (expiry time)>

 

Encoding actual longURL

Algorithm 1 - Use MD5/SHA1

E.g. MD5 128bit -> 22 64-Based encoding characters (since each base64 character encodes 6 bits of the hash value)

1) We can take the first 6 (or 7, 8) letters for the key. This could result in key duplication; to resolve that, we can choose some other characters out of the encoding string or swap some characters.

or SHA-1, SHA-2, SHA-256. 

62^7 = 3.5 trillion

 

Algorithm 2 - Generating shortURLs offline

Algorithm 3 - Unique ID Generator & Base 62 conversion

1) UUID. 32 16-bit characters. 16 Bytes.

2) Multiple MySQL servers. Randomly pick a server.

3) Twitter / Snowflake ID - 64 bits. 1 + 41 timestamp + 10 machine numbers + 12 sequence. Then do the conversion based on 62 or 64. 

E.g. gk.link/a/3hcCxy

 

Data Partition

Purpose: Scale out DB

Split the DB based on hash function and consistency hashing mechanism.

Consistency Hash: Consistent Hashing stores the data managed by a distributed system in a ring. Each node in the ring is assigned with a range of data.

 

Cache

Cache Size : 20K requests/s * 3600 * 24 = 1.728 billion.   1.728 billion * 0.5KB * 20% = 172.8 GB ~ 170 GB. LRU or LFU.

It can be a single machine based on calculation. LRU strategy.

 

Rate Limiter

Ban a user if he makes requests too many times in a short period of time.

 

Expiry - Lazy Clean up

 

 

System Graph

(Source : https://www.raychase.net/6460)

 

Reference

[1] https://zybuluo.com/ysongzybl/note/95360

[2] https://www.educative.io/courses/grokking-the-system-design-interview/m2ygV4E81AR

[3] https://www.raychase.net/6460

posted @ 2021-09-16 13:23  YBgnAW  阅读(329)  评论(0编辑  收藏  举报