ggplot的boxplot/violin plot添加显著性 | Add P-values and Significance Levels to ggplots | 定量分析

 

2022年10月11日

显著性符号的意义 

Symbol Meaning
ns P > 0.05
* P ≤ 0.05
** P ≤ 0.01
*** P ≤ 0.001
****  P ≤ 0.0001

 

参考:What is the meaning of * or ** or *** in reports of statistical significance from Prism or InStat?

文章里的描述:Expression of regional markers TOX3 (r5) and PRDX6 (r6) between HCO and HBSO. Wilcoxon rank sum test was applied. P-values < 0.0001 is marked by ****. 

 


 

2022年10月07日

越高分的文章,对数据的严格的统计处理就越重要,存在一些经典必用的图:

  • boxplot,自带四分位信息,最好加上jitter让人看到你的数据点
  • violin plot,在单细胞里很火,可以直接看到数据的分布,可以叠加boxplot使用
  • 线性拟合回归,lm,我们目前绝对无法handle非线性的回归

这些经典分析必须搭配显著性测试,必须在图里显示P-value,或者P-value对应的符号(*、**、***、NS)。

 

目前在ggplot里添加显著性的主要方法:

  • ggpubr的stat_compare_means,也是最为主流的方法。
    • 优点:可以应对多组比对,理论上可以无限标注分组,快速
    • 缺点:数据处理整合在ggplot函数里,无法自定义处理数据
  • ggpubr的第二个函数stat_pvalue_manual,适合有一个固定的ref
    • 优点:可以自定义处理数据
    • 缺点:没有comparisons参数,多组比较不够自如
  • 自己检验,然后用geom_text手动标注
    • 优点:完全自定义,随心所欲,超批量作图
    • 缺点:不适合多组比对,适合两组对比

 

难点:

  • 在数据零值过多时,所有scale都失效了,此时只能用geom_jitter来显示差异【2022-10-08】
  • 两个control,HSCR paper里,必须后期用inkscape来修改,也算方便。
  • facet_wrap可以做到真正的free_y,但stat_compare_means跟不上,所以得用stat_pvalue_manual(参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/neuronal_subset.ipynb)
  • facet_grid里无法做到真正的free_y,可以用facet_wrap和cowplot的拼图来实现
  • 也可以将数据scale到0-1,这样label时就可以使用固定参数(参考:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb)

 

方法一的方法参考下面的文章

 

方法二:

参考:Data_center/analysis/HBSO_7Ala_Elly_2022_Jul/Seurat_integration_all.ipynb

 

方法三:

现实案例代码:EllyLab/mouse/singleCell/case/Vcl_ENCC/Vcl_ENCCs_aggregate_analysis.ipynb【会封装为函数】

 

 


 

参考:Add P-values and Significance Levels to ggplots

 

ggpubr的包比较局限,能用的test也比较局限,但是做起来快速简单。

当情况特殊时ggpubr就不能用了,可以自己做了显著性test之后再显示在图上。

# show lable in facet grid plot
dat_text <- data.frame()
for (i in names(paired_list)) {
    # Compute t-test
    res <- t.test(value ~ group, data = paired_list[[i]], paired = TRUE)
    dat_text <- rbind(dat_text, data.frame(variable=i, pvalue=res$p.value))
}

dat_text$label <- paste("P", round(dat_text$pvalue, 3), sep="=")

dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label <- paste("*", 
                                  dat_text[dat_text$pvalue<0.05 & dat_text$pvalue>0.01,]$label, sep=" ")

dat_text[dat_text$pvalue<0.01,]$label <- paste("**", "P<0.01", sep=" ")

    library(ggplot2) 
    options(repr.plot.width=8, repr.plot.height=12) # 8x8
    g2 <- ggplot(data=genes_expr_melt, aes(x=pseudotime, y=value, fill=group, color=group)) +
        geom_point(size=0.01, alpha=0.5, aes(color=group, fill=group)) +
        labs(x = "Pseudotime", y = "Relative expression", title = "Neuronal lineage") +
        geom_smooth(method = 'loess',se=F,size=0.15,span = 0.7) + # ,alpha=0.05, weight=0.1,
        facet_wrap( ~ variable, ncol=3, labeller = label_context, scales = "free_y") + # 
        geom_text(size    = 5, data    = dat_text, mapping = aes(x = Inf, y = Inf, label = label),
            hjust   = 1.05,vjust   = 1.5, color=ifelse(dat_text$pvalue < 0.05,'red','black')) +
        # themes
        theme(strip.background = element_blank(),
             panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
             panel.spacing=unit(.4, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.5))+
        theme(axis.text.x  = element_text(face="plain", angle=0, size = 10, color = "black", vjust=0.5),
            axis.text.y  = element_text(face="plain", size = 10, color = "black"),
            axis.title =element_text(size = 15)) +
        theme(strip.background = element_rect(fill = "gray90", color = NA))+
          # theme(legend.position = "none") + # must remove legend
          theme(strip.placement = "outside", strip.text.x = element_text(face="plain", size = 13),
                  strip.text.y = element_text(face="plain", size = 11)) +
        theme(strip.text.x = element_text(margin = margin(1,0,1,0, "mm"))) +
        scale_color_manual(values=c("deepskyblue","red","gray50")) +
        scale_fill_manual(values=c("deepskyblue","red","gray50"))
    plot(g2)
# }

 

多组比较,挑选感兴趣的显示显著性。

data("ToothGrowth")
head(ToothGrowth)  
library(ggpubr)
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
options(repr.plot.width=4, repr.plot.height=4)
ggplot(ToothGrowth, aes(x=as.character(dose), y=len, fill=dose)) +
  geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) +
  geom_jitter(width = 0.3, size=0.01) +# , aes(color=supp) +
  stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 50, label.x = 1.5)     # Add global p-value

还可以设定一个ref group来显示显著性差异,只需要改一下设定。

  stat_compare_means(method = "anova", label.y = 1.3, label.x = 3)+ # Add pairwise comparisons p-value
  # # Add global p-value
  stat_compare_means(label = "p.signif", method = "t.test", ref.group = "hNP-D20", label.y = 1.1) + 

  

 

生物学的强烈推荐看看Y叔的公众号里的统计相关的文章,非常的基础和实用。

统计

 

代码例子:

options(repr.plot.width=7, repr.plot.height=6)
# facet boxplot
bp <- ggplot(expr_data2, aes(x=group, y=expression, fill=NA)) + 
  geom_boxplot(outlier.size=NA, size=0.01, outlier.shape = NA) + 
  geom_jitter(width = 0.3, size=0.01, aes(color=cluster)) +
  # + geom_boxplot( + 
  facet_grid( cluster ~ gene, switch="y") + # , scales = "free"
  theme_bw() + 
  stat_compare_means(aes(group = group, label = ..p.signif..), label.x = 1.3,label.y = 1.3, 
                     method = "wilcox.test", hide.ns = T) + # label = "p.format",
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  labs(x = "", y = "", title = "") +
  theme(panel.spacing=unit(.3, "lines"),panel.border = element_rect(color = "black", fill = NA, size = 0.2)) +
  theme(axis.ticks.x = element_blank(), axis.ticks = element_line(size = 0.1), 
        axis.text.x  = element_text(face="plain", angle=90, size = 8, color = "black", vjust=0.5),
        axis.text.y  = element_text(face="plain", size = 4, color = "black"),
        axis.title =element_text(size = 12)) +
  theme(strip.background = element_rect(fill = "gray97", color = NA))+
  theme(legend.position = "none") +
  theme(strip.placement = "outside", strip.text.x = element_text(face="italic", size = 11),
          strip.text.y = element_text(face="plain", size = 11)) +
  scale_y_continuous(position="right", limits = c(-0.5,1.5)) +
  scale_fill_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)]) +
  scale_color_manual(values=brewer.pal(8,"Set2")[c(2,3,7,1,5,6)])
bp

  

posted @ 2019-05-31 15:51  Life·Intelligence  阅读(5895)  评论(0编辑  收藏  举报
TOP