need to work on item
ML:
http://www.countrysideinfo.co.uk/simpsons.htm
Least discriminative analysis
Q6 –
single linear regression with interaction item,怎么处理 heterogeneous variance in error .
Q7 – pa
Explain what is Linear Regression?
Q8 –
Why use feature selection? If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of the coefficients?
Q12 –
Describe the process of data analysis?
Q13 –
open concept questions on how to identify and fight spam, etc. What estimation method would you use for this model?
Q8 – expectation 2 times
second price auction. 两个人在【0,1】uniform的出价,出价高的最后付出价低的钱,问得到钱数的期望是多少。
Ads 2nd price option—winner paid second price. You and I arebidding.
My bidding price is unknown to you, but your bidding price and outcomeis available to you. Also know that my bidding price follows exponential distributionwith parameter lambda, ask how to estimate lambda in my bidding price.
一个dataset, with two columns. 第一列是click, 0 或者1, 第二列是cost, 是连续值。 average cost per click = sum of cost / sum of click. 问如何给出average cost per click 的置信区间 (hint:没有公式可算)
100 bundle of purchases, 有三种models of cars. 知道每个bundle 的total price, 和每种车的数量,问如何估计price for each model.
Q11
Explain a probability distribution that is not normal and how to apply that.
Q11 – sampling
How to verify whether x, y, z follow multivariate normal distribution with known population mean and covariance?
How to check whether it’s randomly sampled?
Q12 –
N people has coordinate (xi, yi), they want to meet together at oneplace. Q: What is the meeting place?
Q13 – coins
1) 抛硬币N次,得n个正面,求得正面概率(point est., CI)
2) 再抛同一个硬币,求得出第一个正面所需要抛掷次数的数学期望。
3) 第一题是问一个硬币连续扔出0或1后,出现第一个1或0的投掷次数期望是多少。
4) 50个人各扔硬币100次,求head次数最大值的期望,CI, 如果是biased怎么办。
Q14 -
捕捉——放回——捕捉模型,第一次捕到N只,标记a只;第二次捕捉M只,b只有标记。
1) 求population size的point est.
2) 求population size的CI
Q14 -什么是t test,z test, 怎么用。 比较两个group的ctr,用z test 还是t test?statstics的公式是什么,怎么确定sample size。
Q15 –
一个连续函数f,画出来就是一个curve,问什么方法求curve下的面积。但是函数f不知道,但是有另外一个函数g(x, y)。g(x, y) = 1 if f(x)<y; g(x,y) = 0 if f(x) >=y。楼主先说,把x轴上f的取值范围[a,b],等分100份,每个小区间上找中点x0,用二分法找f(x0) 的值,区间的length*f(x0) 估计这个区间下curve下的面积,然后都加起来估计整个curve的面积。面试官问还有其他方法吗? 回答,做simulation。已知取值范围:x~[a,b], f(x)~[c,d]. Generate随机数uniformally(x,y),发到函数g里,数多少g(x,y) =0.
Q16 – QA
If you want to know the total population of bears in a national park, what could you do?
Q17 – QA
How do you estimate the probability of people who are left-hand and own bicycle?
Q18 -
Make an unfair coin fair - Use multiple tosses to determine outcome
Q19 -
find the width of the confidence interval