[Big Data] Week 2: LSH (Basic)
Question 1
Your Answer | Score | Explanation | |
---|---|---|---|
There are 3 pairs at distance 1. | |||
There is 1 pair at distance 4. | Correct | 1.00 | |
There are 4 pairs at distance 5. | |||
There is 1 pair at distance 3. | |||
Total | 1.00 / 1.00 |
Question 2
C1 | C2 | C3 | C4 | |
---|---|---|---|---|
R1 | 0 | 1 | 1 | 0 |
R2 | 1 | 0 | 1 | 1 |
R3 | 0 | 1 | 0 | 1 |
R4 | 0 | 0 | 1 | 0 |
R5 | 1 | 0 | 1 | 0 |
R6 | 0 | 1 | 0 | 0 |
Perform a minhashing of the data, with the order of rows: R4, R6, R1, R3, R5, R2. Which of the following is the correct minhash value of the stated column? Note: we give the minhash value in terms of the original name of the row, rather than the order of the row in the permutation. These two schemes are equivalent, since we only care whether hash values for two columns are equal, not what their actual values are.
Your Answer | Score | Explanation | |
---|---|---|---|
The minhash value for C1 is R6 | |||
The minhash value for C3 is R4 | Correct | 1.00 | |
The minhash value for C1 is R2 | |||
The minhash value for C3 is R5 | |||
Total | 1.00 / 1.00 |
Question 3
C1 | C2 | C3 | C4 | C5 | C6 | C7 |
---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | 2 | 5 | 4 |
2 | 3 | 4 | 2 | 3 | 2 | 2 |
3 | 1 | 2 | 3 | 1 | 3 | 2 |
4 | 1 | 3 | 1 | 2 | 4 | 4 |
5 | 2 | 5 | 1 | 1 | 5 | 1 |
6 | 1 | 6 | 4 | 1 | 1 | 4 |
Suppose we use locality-sensitive hashing with three bands of two rows each. Assume there are enough buckets available that the hash function for each band can be the identity function (i.e., columns hash to the same bucket if and only if they are identical in the band). Find all the candidate pairs, and then identify one of them in the list below.
Your Answer | Score | Explanation | |
---|---|---|---|
C2 and C3 | |||
C2 and C5 | Correct | 1.00 | |
C4 and C5 | |||
C2 and C7 | |||
Total | 1.00 / 1.00 |
Question 4
ABRACADABRA
and also for the "document":
BRICABRAC
Answer the following questions:
- How many 2-shingles does ABRACADABRA have?
- How many 2-shingles does BRICABRAC have?
- How many 2-shingles do they have in common?
- What is the Jaccard similarity between the two documents"?
Then, find the true statement in the list below.
Your Answer | Score | Explanation | |
---|---|---|---|
ABRACADABRA has 10 2-shingles. | |||
ABRACADABRA has 9 2-shingles. | |||
There are 5 shingles in common. | Correct | 1.00 | |
There are 4 shingles in common. | |||
Total |
Question 5
Your Answer | Score | Explanation | |
---|---|---|---|
(53,15) | Correct | 1.00 | |
(58,13) | |||
(52,13) | |||
(54,8) | |||
Total | 1.00 / 1.00 |