bootstrapping
在这段R Markdown代码中,代表bootstrapping思想的代码片段是以下几段:
- 这部分代码使用bootstrapping方法来估计活性(Active)和抑制(Repressed)状态下
ave列的中位数:
active_med <- c()
repress_med <- c()
for (rep in 1:100) {
active_sample <- sample(active_rep$ave, size = length(active_rep), replace = T)
repress_sample <- sample(repress_rep$ave,size = length(repress_rep),replace = T)
active_med <- c(c(active_med),median(active_sample))
repress_med <- c(c(repress_med),median(repress_sample))
}
- 这部分代码通过多次随机抽样来估计
result中1的数量的分布,并计算其均值和标准差:
num_count <- c()
for (rep in 1:1000) {
sample_num <- sample(result,276,replace = T)
num_count <- c(length(sample_num[sample_num==1]),c(num_count))
}
- 这部分代码使用一个双层循环来为
movie数据集中的每个电影计算95%置信区间的上下界,这也是bootstrapping方法的应用:
min_list <- c()
max_list <- c()
for (i in 1:length(movie$students)){
size0 <- 267 - movie$students[i]
size1 <- movie$students[i]
sample0 <- rep(0, size0)
sample1 <- rep(1, size1)
result <- c(sample0, sample1)
num_count <- c()
for (rep in 1:1000) {
sample_num <- sample(result,276,replace = T)
num_count <- c(length(sample_num[sample_num==1]),c(num_count))
}
quan <- quantile(num_count,probs = c(0.025,0.975))
result <- as.matrix(quan)
min_list <- c(c(min_list),result[1])
max_list <- c(c(max_list),result[2])
}
Bootstrapping是一种统计方法,它通过从数据集中进行多次随机抽样(有放回),来估计统计量的分布。在上述代码中,这种方法被用来估计中位数、数量的分布以及构建置信区间。
使用bootsrapping进行代替chisq的测试,当chisq不满足
- Null hypothesis: there is no difference between the proportion of students who are satisfied with an early or late opening time
- Alternative hypothesis: there is a difference between the proportion of students or students prefer a late opening time (this would be the equivalent of a one-tailed test)
可以构建verctors, 然后用bootstrapping绘制出分布和置信区间
bootstrapping 的结果比较的时候,尽量用概率(正则化,两组的总数可能不一样)
for (a in 1:100) { first_sample <-
mean(sample(first_results, length(first_results), replace = T)) second_sample <-
mean(sample(second_results, length(second_results), replace = T))
first_bootstraps <- c(first_bootstraps, first_sample)
second_bootstraps <- c(second_bootstraps, second_sample)
}
first_upper <- quantile(first_bootstraps, probs = c(0.975))
second_lower <- quantile(second_bootstraps, probs = c(0.025))
boxplot(
first_bootstraps,
second_bootstraps,
notch = T,
names = c('early', 'late'),
ylab = 'Prop. of satisfied button presses'
)

浙公网安备 33010602011771号