Top K - System Design

Fast Path (like 1 minute, 5 minutes)

Use a count-min sketch algorithm (Counting frequency by using multiple hash functions) and aggregates data for a short period of time. No need to partition the data.

Slow Path (like 1 hour, 1 day)

Data partitioners parse batches of events into individual events and do hash partitioning, send messages.

Data processors do aggregation and send to file system.

The MapReduce jobs do the frequency count and select topK in each job.

Thesis

A Survey of Top-k Query Processing Techniques in Relational Database Systems
http://www.cs.umd.edu/~samir/498/topk.pdf

Efficient Computation of Frequent and Top-k Elements in Data Streams ⋆
http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf

Continuous Monitoring of Top-K Queries over Sliding Windows
http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1546&context=sis_research

Reference

[1] https://www.youtube.com/watch?v=kx-XDoPjoHw

[2] https://soulmachine.gitbooks.io/system-design/content/cn/bigdata/heavy-hitters.html

[3] https://github.com/thachlp/system-design-concept/blob/master/linkedin/topk.md

posted @ 2021-11-09 14:02 YBgnAW 阅读(199) 评论(0) 收藏举报

刷新页面返回顶部

YBgnAW

Top K - System Design

公告