gg_sankey

 

好像有好久没有更新了,一直想自己用ggplot2实现一下sankey图,就着手做了一下最简单的.

一般的sankey图长这样,左边一列,右边一列,中间的条带是左右两个状态之间的转变.

那么,首先我们就需要构建左右两边的bar,在每个柱的中间标注上所占的比例:

library(ggplot2)
color_list <- c("#f38181", "#fce38a", "#61c0bf", "#95e1d3")
bar_data <- data.frame(
  x = c(1, 1, 1, 11, 11,11,11),
  type = c("a", "b", "c", "a", "b", "c","d"),
  y = c(0.2, 0.3, 0.5, 0.1, 0.5, 0.2, 0.2)
)
text_data_create <- function(bar_data){
  x = bar_data$x
  text = bar_data$y
  y = apply(
    matrix(names(table(x)), ncol = 1),
    1,
    function(x_group){
      index = which(x == as.numeric(x_group))
      start = cumsum(text[index])
      end = c(0, start[1:(length(start)-1)])
      
      return((1-(start + end)/2))
    }
  )
  text_data =  data.frame(
    x = x,
    y = unlist(y), 
    text = text
  )
}
bar_p <- ggplot(data = bar_data) +
  geom_bar(position = "fill", stat = "identity", aes(fill = type, x,y), colour = "white", width = 0.8) +
  geom_text(data = text_data_create(bar_data), aes(x, y, label = text)) +
  scale_fill_manual(values = color_list) 

结果如图:

接下去就是中间引流线的构建,简单来说其实就是确定上线和下线,为了美观,我用 \(X^{3}\)给线加上弧度:

river_data_create <- function(start_y_upper, end_y_upper, start_y_lower, end_y_lower){
  x = seq((1 + 0.8/2), (11 - 0.8/2), length = 10000)
  mean_y_upper = (start_y_upper + end_y_upper)/2
  y_upper = (start_y_upper - mean_y_upper)/(4.6^3)*(-x + 6)^3 + mean_y_upper
  mean_y_lower = (start_y_lower + end_y_lower)/2
  y_lower = (start_y_lower - mean_y_lower)/(4.6^3)*(-x + 6)^3 + mean_y_lower
  river_data = data.frame(
    x,
    y_upper,
    y_lower
  )
  text_data = data.frame(
    x = 6,
    y = (start_y_upper + end_y_lower) / 2,
    text = as.character(start_y_upper - start_y_lower)
  )
  return(list(line = river_data, text = text_data))
}

这样就完成了计算导流线的点坐标,之后就利用 geom_ribbon 往图层上添加即可.

river_data <- river_data_create(1,0.9,0.9, 0.8)
sankey_p <- bar_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[1], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

river_data <- river_data_create(0.5,0.6, 0.4, 0.5)
sanky_p <- sanky_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[3], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

river_data <- river_data_create(0.2,0.2, 0, 0)
sanky_p <- sanky_p + 
  geom_ribbon(data = river_data$line, aes(x, ymin = y_lower, ymax = y_upper), fill = color_list[3], colour = "white", alpha = 0.2) +
  geom_text(data = river_data$text, aes(x,y,label = text))

最后就是对theme的调整,把一些没用的线去掉:

虽然现在已经有很多包可以实现 sankey 图的绘画, 比如 riverplot, 但是实现一次还是挺有意思的.

最后,祝您

身体健康.

posted @ 2019-07-03 16:04  PeRl`  阅读(215)  评论(0编辑  收藏  举报