《r语言实战》菜鸟学习笔记（二）

这一部分将要说明R语言的数据类型以及数据输入方面的内容

因子

R语言中变量可以归结为名义型，有序型和连续变量、

名义型：没有顺序之分的变量。如天气阴晴等

有序型：有顺序关系，但不是数量关系。心情好坏适中等

连续型：就是同是有数量和顺序。当然这里的连续型并不是数学中的连续，也包括离散数据

名义型和有序型在R中称为因子。

下面介绍factor（）函数

diabetes <- c("Type1", "Type2", "Type1", "Type1")
diabetes <- factor(diabetes)
#上面factor将此向量存储为（1,2,1,1），并在内部关联1=Type 2=Type2.

有序型需要在factor（）函数中制定 ordered=TRUE

STATUS <= C("Poor", "Improved", "Excellent", "Poor")
status <- factor(status, ordered=TRUE)
#向量编码为（3,2,1,3）

但是如何保证 1=Poor，2=Improved,3=Excelent呢，如下方法

status <- facotr(status, order=TRUE, levels = c("Poor", "Improved","Excellent"))

但是有序因子和普通的因子有什么区别呢？请看下面程序：

patientID <- c(1,2,3,4)
age<- c(25,34,28,52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "excellent", "Poor")
diabetes <- factor(diabetes)
status <- factor(status, order = TRUE)
patientdata <- data.frame(patientID, age, diabetes, status)
str(patientdata)
#以下内容是输出

'data.frame': 4 obs. of 4 variables:
$ patientID: num 1 2 3 4
$ age : num 25 34 28 52
$ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
$ status : Ord.factor w/ 3 levels "excellent"<"Improved"<..: 3 2 1 3

summary(patientdata)
#以下是输出（没有对齐）

patientID age diabetes status
Min. :1.00 Min. :25.00 Type1:3 excellent:1
1st Qu.:1.75 1st Qu.:27.25 Type2:1 Improved :1
Median :2.50 Median :31.00 Poor :2
Mean :2.50 Mean :34.75
3rd Qu.:3.25 3rd Qu.:38.50
Max. :4.00 Max. :52.00

其中diabetes和status显示了频数.

列表

不要小看列表，R语言中的列表可以包含向量、矩阵、数据框、其实其他列表。

mylist <-list(object1,....)

mylist <-list(name1 = object1,name2=object2,...)

举个例子

g <- "My First List"
h <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title=g, ages=h, j,k)
mylist
#以下是运行结果

$title
[1] "My First List"

$ages
[1] 25 26 18 39

[[3]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

[[4]]
[1] "one" "two" "three"

元素分别为：字符串，数值型向量，矩阵和字符型向量.

tips：

1. R中没有标量

2. R的下标从1开始

3. 变量无法声明

数据的输入

1. 使用键盘输入

mydata <- data.frame(age=numeric(0),gender=character(0), weight=numeric(0))
mydata <- edit(mydata)#或者fix(mydata)

2. 带分割符号的文本文件

mydataframe <- read.table(file, header=logical_value, sep=“delimiter", row.names="name")

其中file是带有分隔符的ascii文本文件，header是一个表明首行是否包含了变量名的逻辑值，sep用来指定分割数据的分隔符，row.names是一个可选参数，用以指定一个或者多个表示行标识符的变量。

举例：

grade <- read.table("studentgrades.csv", header=TRUE,sep=",", row.names="STUDENTID"

posted @ 2014-10-02 13:00 程序员阿力阅读(2378) 评论(0) 收藏举报

刷新页面返回顶部

阿力的博客

操千曲而后晓声，观千剑而后识器

《r语言实战》菜鸟学习笔记（二）

因子

列表

数据的输入

公告