# regression | p-value | Simple (bivariate) linear model | 线性回归 | 多重检验 | FDR | BH | R代码

P122, 这是IQR method课的第一次作业，需要统计检验，x和y是否显著的有线性关系。

Assignment 1
1) Find a small bivariate dataset (preferably from your
own discipline) and produce a scatterplot (this is easy
2) Use any statistics tool (a calculator, spreadsheet or
statistical package) to calculate the best fitting
regression line and test whether the population slope
(=B) is zero.
Notes:
1. Testing whether the population slope (=B) is zero is
different to whether the estimated slope (=b) is zero.
toolpak-in-excel-6a63e598-cd6d-42e3-9317-6b40ba1a66b4>


这和基本的代数一样，只是统计更加严谨，把误差纳入到模型中了。

A和B可以看做是群体的参数，a和b可以看做是样本的估计参数，我们的方法是通过使残差最小来估计出a和b。

height <- c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)
bodymass <- c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)
plot(bodymass, height)
plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)")


eruption.lm = lm(eruptions ~ waiting, data=faithful)
summary(eruption.lm)
help(summary.lm)

Call:
lm(formula = eruptions ~ waiting, data = faithful)

Residuals:
Min      1Q  Median      3Q     Max
-1.2992 -0.3769  0.0351  0.3491  1.1933

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.87402    0.16014   -11.7   <2e-16 ***
waiting      0.07563    0.00222    34.1   <2e-16 ***
---
Signif. codes:  0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.497 on 270 degrees of freedom
Multiple R-squared: 0.811,      Adjusted R-squared: 0.811
F-statistic: 1.16e+03 on 1 and 270 DF,  p-value: <2e-16


Decide whether there is a significant relationship between the variables in the linear regression model of the data set faithful at .05 significance level.

NULL hypothesis: no relationship between x and y, so the slope is zero.

Lecture 10: Multiple Testing - PPT通俗易懂

Bonferroni，直接把a 0.05除以次数，比如1万，来设立显著性阈值，这样会极大地增大第二类错误，我们会漏掉大量有用的信息。

FDR，就是错误发现率，假/真阳性比例，就是显著水平里的真显著和假显著的比例。千万不要和假阳性率搞混了。

medium专题

How to Interpret Regression Analysis Results: P-values and Coefficients

null hypothesis：coefficient is 0，如果p-value小于0.05，我们就可以拒绝零假设。

multiple testing

Benjamini and Hochberg's method

aggregated FDR

FDR with group info

Hu, James X., Hongyu Zhao, and Harrison H. Zhou. "False discovery rate control with groups." Journal of the American Statistical Association 105.491 (2010): 1215-1227.

pak说：这个太重要了，对于大数据时代的统计而言。

posted @ 2019-04-01 13:39 Life·Intelligence 阅读(...) 评论(...) 编辑 收藏
TOP