TinyURL - System Design
CREATED 2021/09/15 22:15
Design Tiny URL System
Functional requirements
1. URL shortening: given a long url, return a short url
2. URL redirecting: given a short url, redirect to the original long url
3. The user can pick up a special alias.
4. The user can define the specific expiry time for a short url. Or should we add expiry time for shortURL?
5. Can one longURL have multiple/several shortURLs? Will the system return two different short URLs when given the same long URL? Based on if we need user authentication. To be discussed.
Non-Functional requirements
High availability with fault tolerance, scalability in service.
Real-time with minimum latency in redirection.
Shortened URLs are not guessable or predictable.
Estimations
Write Bandwidth : 100KB/s
Read Bandwidth : 10MB/s
APIs
POST api/v1/data/shorten
request parameters{
string : longURL
long : expiry time (optional)
string : special alias (optional)
} -> {shortURL}
201 Created
Return a shortURL
GET api/v1/getUrl {shortURL} -> {Return longURL for HTTP redirection}
301(permanetly and the browser caches the response) 302(temporarily redirecting)
Read/Write ratio 100:1. This is a read-heavy system.
Store/DB
hash table <shortURL(6 Byte per shortURL), longURL, (expiry time)>
Encoding actual longURL
Algorithm 1 - Use MD5/SHA1
E.g. MD5 128bit -> 22 64-Based encoding characters (since each base64 character encodes 6 bits of the hash value)
1) We can take the first 6 (or 7, 8) letters for the key. This could result in key duplication; to resolve that, we can choose some other characters out of the encoding string or swap some characters.
or SHA-1, SHA-2, SHA-256.
62^7 = 3.5 trillion
Algorithm 2 - Generating shortURLs offline
Algorithm 3 - Unique ID Generator & Base 62 conversion
1) UUID. 32 16-bit characters. 16 Bytes.
2) Multiple MySQL servers. Randomly pick a server.
3) Twitter / Snowflake ID - 64 bits. 1 + 41 timestamp + 10 machine numbers + 12 sequence. Then do the conversion based on 62 or 64.
E.g. gk.link/a/3hcCxy
Data Partition
Purpose: Scale out DB
Split the DB based on hash function and consistency hashing mechanism.
Consistency Hash: Consistent Hashing stores the data managed by a distributed system in a ring. Each node in the ring is assigned with a range of data.
Cache
Cache Size : 20K requests/s * 3600 * 24 = 1.728 billion. 1.728 billion * 0.5KB * 20% = 172.8 GB ~ 170 GB. LRU or LFU.
It can be a single machine based on calculation. LRU strategy.
Rate Limiter
Ban a user if he makes requests too many times in a short period of time.
Expiry - Lazy Clean up
System Graph
(Source : https://www.raychase.net/6460)
Reference
[1] https://zybuluo.com/ysongzybl/note/95360
[2] https://www.educative.io/courses/grokking-the-system-design-interview/m2ygV4E81AR
[3] https://www.raychase.net/6460