[Big Data] Week 3 (Basic)
Question 1
Your Answer | Score | Explanation | |
---|---|---|---|
The fraction of 1's is 79/99. | |||
The fraction of 1's is 1-e-20/99. | Correct | 1.00 | |
The fraction of 1's is 20/99. | |||
The fraction of 0's is 20/99. | |||
Total | 1.00 / 1.00 |
Question 2
The method of Section 4.2.4 will be used. User ID's will be hashed to a bucket number, from 0 to 999,999. At all times, there will be a threshold t such that the 100-byte records for all the users whose ID's hash to t or less will be retained, and other users' records will not be retained. You may assume that each user generates emails at exactly the same rate as other users. As a function of n, the number of emails in the stream so far, what should the threshold t be in order that the selected records will not exceed the 1010 bytes available to store records? From the list below, identify the true statement about a value of n and its value of t.
Your Answer | Score | Explanation | |
---|---|---|---|
n = 109; t = 999 | |||
n = 1012; t = 999 | |||
n = 1013; t = 9 | Correct | 1.00 | |
n = 1011; t = 1000 | |||
Total | 1.00 / 1.00 |
From the problem we know that there are currently N emails in the stream and 10^6 buckets and we can thus calculate the email capacity of each bucket as N/10^6 emails.
We also know that each email needs 100 bytes, hence the total space requirement per bucket is (N/10^6)∗100 bytes
Let's consider the worst case scenario where all the N emails in the stream have to be retained.
Let's assume that the total number of buckets we would need for this scenario is ( t+1 ) since we started the bucket count from 0.
So (space requirement per bucket) ∗ (total number of buckets) <= Total available space
(N/10^6)∗100 ∗ ( t + 1) <= 1010
Further simplification gives
t <= ( 10^14 / N ) -1