bootstrapping

在这段R Markdown代码中,代表bootstrapping思想的代码片段是以下几段:

  1. 这部分代码使用bootstrapping方法来估计活性(Active)和抑制(Repressed)状态下ave列的中位数:
active_med <- c()
repress_med <- c()
for (rep in 1:100) {
  active_sample <- sample(active_rep$ave, size = length(active_rep), replace = T)
  repress_sample <- sample(repress_rep$ave,size = length(repress_rep),replace = T)
  active_med <- c(c(active_med),median(active_sample))
  repress_med <- c(c(repress_med),median(repress_sample))
}
  1. 这部分代码通过多次随机抽样来估计result1的数量的分布,并计算其均值和标准差:
num_count <- c()
for (rep in 1:1000) {
  sample_num <- sample(result,276,replace = T)
  num_count <- c(length(sample_num[sample_num==1]),c(num_count))
}
  1. 这部分代码使用一个双层循环来为movie数据集中的每个电影计算95%置信区间的上下界,这也是bootstrapping方法的应用:
min_list <- c()
max_list <- c()
for (i in 1:length(movie$students)){
  size0 <- 267 - movie$students[i]
  size1 <- movie$students[i]
  sample0 <- rep(0, size0)
  sample1 <- rep(1, size1)
  result <- c(sample0, sample1)
  num_count <- c()
  for (rep in 1:1000) {
    sample_num <- sample(result,276,replace = T)
    num_count <- c(length(sample_num[sample_num==1]),c(num_count))
  }
  quan <- quantile(num_count,probs = c(0.025,0.975))
  result <- as.matrix(quan)
  min_list <- c(c(min_list),result[1])
  max_list <- c(c(max_list),result[2])
}

Bootstrapping是一种统计方法,它通过从数据集中进行多次随机抽样(有放回),来估计统计量的分布。在上述代码中,这种方法被用来估计中位数、数量的分布以及构建置信区间。

使用bootsrapping进行代替chisq的测试,当chisq不满足

  • Null hypothesis: there is no difference between the proportion of students who are satisfied with an early or late opening time
  • Alternative hypothesis: there is a difference between the proportion of students or students prefer a late opening time (this would be the equivalent of a one-tailed test)

可以构建verctors, 然后用bootstrapping绘制出分布和置信区间
bootstrapping 的结果比较的时候,尽量用概率(正则化,两组的总数可能不一样)

for (a in 1:100) { first_sample <-
mean(sample(first_results, length(first_results), replace = T)) second_sample <-
mean(sample(second_results, length(second_results), replace = T))
first_bootstraps <- c(first_bootstraps, first_sample)
second_bootstraps <- c(second_bootstraps, second_sample) 
}
first_upper <- quantile(first_bootstraps, probs = c(0.975)) 
second_lower <- quantile(second_bootstraps, probs = c(0.025)) 

boxplot(
first_bootstraps,
second_bootstraps,
notch = T,
names = c('early', 'late'),
ylab = 'Prop. of satisfied button presses'
)
posted @ 2024-05-30 09:11  chen生信  阅读(99)  评论(0)    收藏  举报