Common Knowledge of System Design

First created: 2021-02-12 21:00

https://medium.com/i0exception/rendezvous-hashing-8c00e2fb58b0

Consistent Hashing

In consistent hashing, both the keys and the buckets/servers are hashed and mapped onto a circle ring by the standard hash function. After a key's hash value is calculated, it will find the machine with the first bigger has value than it to store. Searching for the bucket responsible for a key is pretty simple — pre compute the hash values for all buckets and sort them, hash the key and then run a binary search in O(log(n)) to find the lowest value that’s higher than the hash of the key. When the buckets are resized, the number of keys that need to move is K/n on average — which is ideal.

Rendezvous Hashing（Highest Random Weight hashing）

We hash the key and the machine together and then pick the one with the highest hash value as the machine to store that key. if the buckets change, the keys (on an average, K/n keys) get spread out over all other buckets instead of just one or the number of virtual nodes that were assigned to a machine. The biggest drawback of rendezvous hashing is that it runs in O(n) instead of O(log(n)).

Write-through & write-back (Updated 2021/05/10)

In a write-through cache, every write to the cache causes a synchronous write to the backing store.

In a write-back (or write-behind) cache, writes are not immediately mirrored to the store. Instead, the cache tracks which of its locations have been written over and marks these locations as dirty. The data in these locations is written back to the backing store when those data are evicted from the cache, an effect referred to as a lazy write.

Cache Aside Pattern

失效：应用程序先从cache取数据，没有得到，则从数据库中取数据，成功后，放到缓存中。

命中：应用程序从cache中取数据，取到后返回。

更新：先把数据存到数据库中，成功后，再让缓存失效。

From https://1o24bbs.com/t/topic/17552

答题套路／流程 (Updated 2021/05/10)

Functional Requirements/ Non-Functional Requirements

DAU/Storage estimate

High Level Structure of service/API/Schema of data

Sharding/Fault Tolerance/Primary Secondary mechanism/Load Balancing

1point3acres Reference：

https://www.1point3acres.com/bbs/thread-706795-1-1.html [[经验总结] 我如何秒掉系统设计？An opinionated Approach to System Design Interview]

https://www.1point3acres.com/bbs/thread-715044-1-1.html [[学习资料] 通过公司博客学习系统设计]

https://www.1point3acres.com/bbs/thread-776466-1-1.html [[找工就业] [工作信息] 猴子也能懂的系统设计套路]

(Updated 2021/08/02)

Leader write & Follower Read: -> cons: inconsistent data

Kafka: in-sync replica set (ISR): If a follower failes to follow the leader in a threshold of time, it will be removed.Default minimum in-sync replicas are set to 1 by default in CloudKarafka.

Kafka considers that a record is committed when all replicas in the In-Sync Replica set (ISR) have confirmed that they have written the record to disk. The acks=all setting requests that an ack is sent once all in-sync replicas (ISR) have the record.

The ISR is simply all the replicas of a partition that are "in-sync" with the leader. The definition of "in-sync" depends on the topic configuration, but by default, it means that a replica is or has been fully caught up with the leader in the last 10 seconds. The setting for this time period is: replica.lag.time.max.ms and has a server default which can be overridden on a per topic basis.

If a follower fails, then it will cease sending fetch requests and after the default, 10 seconds will be removed from the ISR. Likewise, if a follower slows down, perhaps a network related issue or constrained server resources, then as soon as it has been lagging behind the leader for more than 10 seconds it is removed from the ISR. Default minimum in-sync replicas are set to 1 by default in CloudKarafka.

At Least Once

(One or More Message Deliveries, Duplicate Possible)

1. A producer sends a batch of messages to Kafka. The consumer never sends back an ACK so it sends the batch again.

2. A producer processes a large file containing events. Half way through processing the file it dies and then restarts. It then starts processing the file again from the start and only marks it as processed(sent to offset storage) when the whole file has been read.

3. A consumer receives a batch of messages from Kafka, transforms these and writes the results to a database. If the consumer fails after writing the data to the database but before saving the offsets back to Kafka, it will reprocess the same records next time it runs.

Action: (1) Set ‘enable.auto.commit’ to false or (2) Set ‘enable.auto.commit’ to true with ‘auto.commit.interval.ms’ to a higher number.

At Most Once

1. The producer performs a ‘fire-and-forget’ approach sending a message to Kafka with no retries and ignoring any response from the broker.

2. The producer saves its progress reading from a source system first, then writes data into Kafka. If the producer crashes before the second step, the data will never be delivered to Kafka.

3. The consumer sends back the ack message before processing the messages. If the consumer fails after saving the offsets back to Kafka, it will skip these records next time it runs and data will be lost.

Action: (1) Set ‘enable.auto.commit’ to true or (2) Set ‘enable.auto.commit’ to true with ‘auto.commit.interval.ms’ to a lower timeframe.

References

https://medium.com/@andy.bryant/processing-guarantees-in-kafka-12dd2e30be0e

https://dzone.com/articles/kafka-clients-at-most-once-at-least-once-exactly-o

(Updated 2021/08/17)

Erasure Codes

Erasure coding makes it possible to protect data without having to fully replicate it because the data can be reconstructed from parity fragments.

N (Erasure Code Set Size) = K(data symbols) + M (parity symbols) e.g. 12 = 8 + 4 or 6 + 6.

Normally N >= 2 * K.

Reference https://www.youtube.com/watch?v=CryhjBWQHvM

https://eng.uber.com/supply-demand-big-data-platform/

(Updated 2021/09/14)