R 之 reshape
在用R处理数据做图的时候,会遇到需要将宽数据转变成为长数据的问题,下面这个链接里的作者已经讲得很详细了,一步步跟着操作就可以实现数据的转换。
R包reshape2,轻松实现长、宽数据表格转换 https://blog.csdn.net/qazplm12_3/article/details/83618497
在这里就只说我在数据格式转换过程中遇到的问题,当然可能是菜鸟问题。
先看示例数据集的处理过程:
> library(reshape2) #载入reshape包 > attach(airquality) #载入airquality数据集 > dim(airquality) #查看数据集的行列数 [1] 153 6 > head(airquality) #查看数据集的前几行 Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 > class(airquality) #查看数据集的数据结构 [1] "data.frame" > str(airquality) #查看数据集的内部结构 'data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day : int 1 2 3 4 5 6 7 8 9 10 ... > aql <- melt(airquality, id.vars=c("Month","Day")) #保留“Month”和“Day”列,其它列数据转换 > head(aql) #查看转化后的数据的前几行 Month Day variable value 1 5 1 Ozone 41 2 5 2 Ozone 36 3 5 3 Ozone 12 4 5 4 Ozone 18 5 5 5 Ozone NA 6 5 6 Ozone 28 > aql <- melt(airquality, id.vars=c("Month","Day"), variable.name = "REGION", value.name = "VALUE") #可以自行对“variable”和“value”命名 > head(aql)
Month Day REGION VALUE
1 5 1 Ozone 41
2 5 2 Ozone 36
3 5 3 Ozone 12
4 5 4 Ozone 18
5 5 5 Ozone NA
6 5 6 Ozone 28
我在读入自己的数据时,数据结构和内部结构与示例数据airquality存在差异,从而melt不成功。
> a <- read.table("file:///C:/Users/lenovo/Desktop/test.txt", header = F, sep="\t", row.names = 1) #读入自己的数据 > a
V2 V3
TC 0.412574939 0.386191484
AG 0.408958034 0.389692067
TG 0.000941386 0.000988865
CT 0.07967101 0.105907482
GT 0.000545013 0.001443743
GA 0.091859486 0.107845658
TA 0.002130506 0.002650159
CG 0.000842293 0.000830647
CA 0.00044592 0.001067975
AC 0.000247733 0.000909756
GC 0.000545013 0.000969088
AT 0.001238666 0.001503075
samples SLE HC
> class(a)
[1] "data.frame" > str(a) 'data.frame': 13 obs. of 2 variables: $ V2: Factor w/ 12 levels "0.000247733",..: 11 10 5 8 3 9 7 4 2 1 ... $ V3: Factor w/ 13 levels "0.000830647",..: 11 12 4 9 6 10 8 1 5 2 ... > b <- t(a) #矩阵旋转 > class(b) [1] "matrix" #数据结构发生改变 > str(b) #内部结构也发生改变 chr [1:2, 1:13] "0.412574939" "0.386191484" "0.408958034" "0.389692067" "0.000941386" "0.000988865" "0.07967101" "0.105907482" ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:2] "V2" "V3" ..$ : chr [1:13] "TC" "AG" "TG" "CT" ... > b TC AG TG CT GT GA TA CG CA AC V2 "0.412574939" "0.408958034" "0.000941386" "0.07967101" "0.000545013" "0.091859486" "0.002130506" "0.000842293" "0.00044592" "0.000247733" V3 "0.386191484" "0.389692067" "0.000988865" "0.105907482" "0.001443743" "0.107845658" "0.002650159" "0.000830647" "0.001067975" "0.000909756" GC AT samples V2 "0.000545013" "0.001238666" "SLE" V3 "0.000969088" "0.001503075" "HC" > b1<- melt(b, id.vars=c("samples"), variable.name = "mismat", value.name = "VALUE") #希望$sample列保留,其它列转换 > b1 #转换后,发生错误 Var1 Var2 VALUE 1 V2 TC 0.412574939 2 V3 TC 0.386191484 3 V2 AG 0.408958034 4 V3 AG 0.389692067 5 V2 TG 0.000941386 6 V3 TG 0.000988865 7 V2 CT 0.07967101 8 V3 CT 0.105907482 9 V2 GT 0.000545013 10 V3 GT 0.001443743 11 V2 GA 0.091859486 12 V3 GA 0.107845658 13 V2 TA 0.002130506 14 V3 TA 0.002650159 15 V2 CG 0.000842293 16 V3 CG 0.000830647 17 V2 CA 0.00044592 18 V3 CA 0.001067975 19 V2 AC 0.000247733 20 V3 AC 0.000909756 21 V2 GC 0.000545013 22 V3 GC 0.000969088 23 V2 AT 0.001238666 24 V3 AT 0.001503075 25 V2 samples SLE 26 V3 samples HC
解决方法:
> b TC AG TG CT GT GA TA CG CA AC V2 "0.412574939" "0.408958034" "0.000941386" "0.07967101" "0.000545013" "0.091859486" "0.002130506" "0.000842293" "0.00044592" "0.000247733" V3 "0.386191484" "0.389692067" "0.000988865" "0.105907482" "0.001443743" "0.107845658" "0.002650159" "0.000830647" "0.001067975" "0.000909756" GC AT samples V2 "0.000545013" "0.001238666" "SLE" V3 "0.000969088" "0.001503075" "HC" > b <- data.frame(b) #转换数据结构 > str(b) #查看内部结构 'data.frame': 2 obs. of 13 variables: $ TC : Factor w/ 2 levels "0.386191484",..: 2 1 ..- attr(*, "names")= chr "V2" "V3" $ AG : Factor w/ 2 levels "0.389692067",..: 2 1 ..- attr(*, "names")= chr "V2" "V3" $ TG : Factor w/ 2 levels "0.000941386",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ CT : Factor w/ 2 levels "0.07967101","0.105907482": 1 2 ..- attr(*, "names")= chr "V2" "V3" $ GT : Factor w/ 2 levels "0.000545013",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ GA : Factor w/ 2 levels "0.091859486",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ TA : Factor w/ 2 levels "0.002130506",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ CG : Factor w/ 2 levels "0.000830647",..: 2 1 ..- attr(*, "names")= chr "V2" "V3" $ CA : Factor w/ 2 levels "0.00044592","0.001067975": 1 2 ..- attr(*, "names")= chr "V2" "V3" $ AC : Factor w/ 2 levels "0.000247733",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ GC : Factor w/ 2 levels "0.000545013",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ AT : Factor w/ 2 levels "0.001238666",..: 1 2 ..- attr(*, "names")= chr "V2" "V3" $ samples: Factor w/ 2 levels "HC","SLE": 2 1 ..- attr(*, "names")= chr "V2" "V3" > b[,-13] <- apply(b[,-13],2, as.character) #factor转换为character > b[,-13] <- apply(b[,-13],2, as.numeric) #character转换为numeric > str(b) #再次查看数据结构,$sample列为factor, 其它列为num,与原数据相符 'data.frame': 2 obs. of 13 variables: $ TC : num 0.413 0.386 $ AG : num 0.409 0.39 $ TG : num 0.000941 0.000989 $ CT : num 0.0797 0.1059 $ GT : num 0.000545 0.001444 $ GA : num 0.0919 0.1078 $ TA : num 0.00213 0.00265 $ CG : num 0.000842 0.000831 $ CA : num 0.000446 0.001068 $ AC : num 0.000248 0.00091 $ GC : num 0.000545 0.000969 $ AT : num 0.00124 0.0015 $ samples: Factor w/ 2 levels "HC","SLE": 2 1 ..- attr(*, "names")= chr "V2" "V3" > b TC AG TG CT GT GA TA CG CA AC GC V2 0.412574939 0.408958034 0.000941386 0.07967101 0.000545013 0.091859486 0.002130506 0.000842293 0.00044592 0.000247733 0.000545013 V3 0.386191484 0.389692067 0.000988865 0.105907482 0.001443743 0.107845658 0.002650159 0.000830647 0.001067975 0.000909756 0.000969088 AT samples V2 0.001238666 SLE V3 0.001503075 HC > b2 <- melt(b, id.vars=c("samples"), variable.name = "mismat", value.name = "VALUE") #格式转换 > b2 #转换成功 samples mismat VALUE 1 SLE TC 0.412574939 2 HC TC 0.386191484 3 SLE AG 0.408958034 4 HC AG 0.389692067 5 SLE TG 0.000941386 6 HC TG 0.000988865 7 SLE CT 0.079671010 8 HC CT 0.105907482 9 SLE GT 0.000545013 10 HC GT 0.001443743 11 SLE GA 0.091859486 12 HC GA 0.107845658 13 SLE TA 0.002130506 14 HC TA 0.002650159 15 SLE CG 0.000842293 16 HC CG 0.000830647 17 SLE CA 0.000445920 18 HC CA 0.001067975 19 SLE AC 0.000247733 20 HC AC 0.000909756 21 SLE GC 0.000545013 22 HC GC 0.000969088 23 SLE AT 0.001238666 24 HC AT 0.001503075
以上转换是在Rstudio中操作的, R version 3.5.1 (2018-07-02)
浙公网安备 33010602011771号