R 之 reshape

  在用R处理数据做图的时候,会遇到需要将宽数据转变成为长数据的问题,下面这个链接里的作者已经讲得很详细了,一步步跟着操作就可以实现数据的转换。

Rreshape2,轻松实现长、宽数据表格转换 https://blog.csdn.net/qazplm12_3/article/details/83618497  

  在这里就只说我在数据格式转换过程中遇到的问题,当然可能是菜鸟问题。

先看示例数据集的处理过程:

> library(reshape2)  #载入reshape包
> attach(airquality)  #载入airquality数据集
> dim(airquality)      #查看数据集的行列数
[1] 153   6
> head(airquality)     #查看数据集的前几行
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5
6    28      NA 14.9   66     5   6
> class(airquality)    #查看数据集的数据结构
[1] "data.frame"
> str(airquality)       #查看数据集的内部结构
'data.frame':    153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
> aql <- melt(airquality, id.vars=c("Month","Day")) #保留“Month”和“Day”列,其它列数据转换
> head(aql)  #查看转化后的数据的前几行
  Month Day variable value
1     5   1    Ozone    41
2     5   2    Ozone    36
3     5   3    Ozone    12
4     5   4    Ozone    18
5     5   5    Ozone    NA
6     5   6    Ozone    28
> aql <- melt(airquality, id.vars=c("Month","Day"), variable.name = "REGION", value.name = "VALUE") #可以自行对“variable”和“value”命名
> head(aql)

Month Day REGION VALUE
1 5 1 Ozone 41
2 5 2 Ozone 36
3 5 3 Ozone 12
4 5 4 Ozone 18
5 5 5 Ozone NA
6 5 6 Ozone 28

  我在读入自己的数据时,数据结构和内部结构与示例数据airquality存在差异,从而melt不成功。

> a <- read.table("file:///C:/Users/lenovo/Desktop/test.txt", header = F, sep="\t", row.names = 1) #读入自己的数据
> a

V2 V3
TC 0.412574939 0.386191484
AG 0.408958034 0.389692067
TG 0.000941386 0.000988865
CT 0.07967101 0.105907482
GT 0.000545013 0.001443743
GA 0.091859486 0.107845658
TA 0.002130506 0.002650159
CG 0.000842293 0.000830647
CA 0.00044592 0.001067975
AC 0.000247733 0.000909756
GC 0.000545013 0.000969088
AT 0.001238666 0.001503075
samples SLE HC

> class(a)
[1] "data.frame"
> str(a)
'data.frame':    13 obs. of  2 variables:
 $ V2: Factor w/ 12 levels "0.000247733",..: 11 10 5 8 3 9 7 4 2 1 ...
 $ V3: Factor w/ 13 levels "0.000830647",..: 11 12 4 9 6 10 8 1 5 2 ...
> b <- t(a)  #矩阵旋转
> class(b)    
[1] "matrix"  #数据结构发生改变
> str(b)        #内部结构也发生改变
 chr [1:2, 1:13] "0.412574939" "0.386191484" "0.408958034" "0.389692067" "0.000941386" "0.000988865" "0.07967101" "0.105907482" ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "V2" "V3"
  ..$ : chr [1:13] "TC" "AG" "TG" "CT" ...
> b
   TC            AG            TG            CT            GT            GA            TA            CG            CA            AC           
V2 "0.412574939" "0.408958034" "0.000941386" "0.07967101"  "0.000545013" "0.091859486" "0.002130506" "0.000842293" "0.00044592"  "0.000247733"
V3 "0.386191484" "0.389692067" "0.000988865" "0.105907482" "0.001443743" "0.107845658" "0.002650159" "0.000830647" "0.001067975" "0.000909756"
   GC            AT            samples
V2 "0.000545013" "0.001238666" "SLE"  
V3 "0.000969088" "0.001503075" "HC"   
> b1<- melt(b, id.vars=c("samples"), variable.name = "mismat", value.name = "VALUE")   #希望$sample列保留,其它列转换
> b1  #转换后,发生错误
   Var1    Var2       VALUE
1    V2      TC 0.412574939
2    V3      TC 0.386191484
3    V2      AG 0.408958034
4    V3      AG 0.389692067
5    V2      TG 0.000941386
6    V3      TG 0.000988865
7    V2      CT  0.07967101
8    V3      CT 0.105907482
9    V2      GT 0.000545013
10   V3      GT 0.001443743
11   V2      GA 0.091859486
12   V3      GA 0.107845658
13   V2      TA 0.002130506
14   V3      TA 0.002650159
15   V2      CG 0.000842293
16   V3      CG 0.000830647
17   V2      CA  0.00044592
18   V3      CA 0.001067975
19   V2      AC 0.000247733
20   V3      AC 0.000909756
21   V2      GC 0.000545013
22   V3      GC 0.000969088
23   V2      AT 0.001238666
24   V3      AT 0.001503075
25   V2 samples         SLE
26   V3 samples          HC

解决方法:

> b 
   TC            AG            TG            CT            GT            GA            TA            CG            CA            AC           
V2 "0.412574939" "0.408958034" "0.000941386" "0.07967101"  "0.000545013" "0.091859486" "0.002130506" "0.000842293" "0.00044592"  "0.000247733"
V3 "0.386191484" "0.389692067" "0.000988865" "0.105907482" "0.001443743" "0.107845658" "0.002650159" "0.000830647" "0.001067975" "0.000909756"
   GC            AT            samples
V2 "0.000545013" "0.001238666" "SLE"  
V3 "0.000969088" "0.001503075" "HC"
> b <- data.frame(b) #转换数据结构
> str(b)             #查看内部结构
'data.frame':    2 obs. of  13 variables:
 $ TC     : Factor w/ 2 levels "0.386191484",..: 2 1
  ..- attr(*, "names")= chr  "V2" "V3"
 $ AG     : Factor w/ 2 levels "0.389692067",..: 2 1
  ..- attr(*, "names")= chr  "V2" "V3"
 $ TG     : Factor w/ 2 levels "0.000941386",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ CT     : Factor w/ 2 levels "0.07967101","0.105907482": 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ GT     : Factor w/ 2 levels "0.000545013",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ GA     : Factor w/ 2 levels "0.091859486",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ TA     : Factor w/ 2 levels "0.002130506",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ CG     : Factor w/ 2 levels "0.000830647",..: 2 1
  ..- attr(*, "names")= chr  "V2" "V3"
 $ CA     : Factor w/ 2 levels "0.00044592","0.001067975": 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ AC     : Factor w/ 2 levels "0.000247733",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ GC     : Factor w/ 2 levels "0.000545013",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ AT     : Factor w/ 2 levels "0.001238666",..: 1 2
  ..- attr(*, "names")= chr  "V2" "V3"
 $ samples: Factor w/ 2 levels "HC","SLE": 2 1
  ..- attr(*, "names")= chr  "V2" "V3"
> b[,-13] <- apply(b[,-13],2, as.character)  #factor转换为character
> b[,-13] <- apply(b[,-13],2, as.numeric)    #character转换为numeric
> str(b)  #再次查看数据结构,$sample列为factor, 其它列为num,与原数据相符
'data.frame':    2 obs. of  13 variables:
 $ TC     : num  0.413 0.386
 $ AG     : num  0.409 0.39
 $ TG     : num  0.000941 0.000989
 $ CT     : num  0.0797 0.1059
 $ GT     : num  0.000545 0.001444
 $ GA     : num  0.0919 0.1078
 $ TA     : num  0.00213 0.00265
 $ CG     : num  0.000842 0.000831
 $ CA     : num  0.000446 0.001068
 $ AC     : num  0.000248 0.00091
 $ GC     : num  0.000545 0.000969
 $ AT     : num  0.00124 0.0015
 $ samples: Factor w/ 2 levels "HC","SLE": 2 1
  ..- attr(*, "names")= chr  "V2" "V3"
> b
            TC          AG          TG          CT          GT          GA          TA          CG          CA          AC          GC
V2 0.412574939 0.408958034 0.000941386  0.07967101 0.000545013 0.091859486 0.002130506 0.000842293  0.00044592 0.000247733 0.000545013
V3 0.386191484 0.389692067 0.000988865 0.105907482 0.001443743 0.107845658 0.002650159 0.000830647 0.001067975 0.000909756 0.000969088
            AT samples
V2 0.001238666     SLE
V3 0.001503075      HC

> b2 <- melt(b, id.vars=c("samples"), variable.name = "mismat", value.name = "VALUE") #格式转换
> b2 #转换成功
   samples mismat       VALUE
1      SLE     TC 0.412574939
2       HC     TC 0.386191484
3      SLE     AG 0.408958034
4       HC     AG 0.389692067
5      SLE     TG 0.000941386
6       HC     TG 0.000988865
7      SLE     CT 0.079671010
8       HC     CT 0.105907482
9      SLE     GT 0.000545013
10      HC     GT 0.001443743
11     SLE     GA 0.091859486
12      HC     GA 0.107845658
13     SLE     TA 0.002130506
14      HC     TA 0.002650159
15     SLE     CG 0.000842293
16      HC     CG 0.000830647
17     SLE     CA 0.000445920
18      HC     CA 0.001067975
19     SLE     AC 0.000247733
20      HC     AC 0.000909756
21     SLE     GC 0.000545013
22      HC     GC 0.000969088
23     SLE     AT 0.001238666
24      HC     AT 0.001503075

以上转换是在Rstudio中操作的, R version 3.5.1 (2018-07-02)

 

posted on 2019-04-11 21:12  Najierida  阅读(653)  评论(0)    收藏  举报

导航