Computing for Data Analysis by Roger D. Peng @ Hopkins: Notes -- Data Types and Basic Operations
Posted on 2013-01-08 00:20 shirley_cst 阅读(214) 评论(0) 收藏 举报Overview and History of R
R vs. S
Introduction to the R Language
Data Types and Basic Operations
1. Data Types
(1) Five basic/atomic classes of objects: numeric, integer, complex, character, logic
(2) Objects
a. vector vs. list
b. matrice vs. data.frame
c. factors
(3) Numbers
a. numeric vs. integer
b. Inf, NaN
(4) Attributes: names/dimnames, class, dimensions, class, length, other user-defined attributes/metadata
2. Operations
(1) <- assignment
(2) : create integer sequences
(3) [], [[]], $ subset
a. subset a vector: regular, logical expression/vector
b. subset matrix: 1 vector vs. 1*1 matrix(drop), certain element/row/column
c. subset list: [ vs [[, subset multiple elements, computable names, subset nested element, partial matching
(4) remove NA values
a. is.na()+!+[]
b. complete.cases()+[]
c. complete.cases() for matrix ???
3. Functions
attributes(), class, dim(), names(), dimnames(), levels()
print(), table()
vector(), c(), matrix(), read.csv(), read.table(), list()
cbind(), rbind()
as.numeric(), as.logical(), as.character(), as.complex(), as.integer()
4. Missing Values
Na, NaN: is.na(), is.nan()
Vectorized Operations
1. + * /
2. > >= ==
3. %*%
Reading and Writing Data
| Reading File | Writing File | Note |
| read.table(), read.csv() | write.table() | tabular data |
| readLine() | writeLines() | lines of a text file |
| source() | dump() | R code files |
| dget() | dput() | R code files |
| load() | save() | workspaces |
| unserialize() | serialize() | R objects in binary form |
Reading data files with read.table()
1. arguments
file, header(logical), sep(a character string), colClasses(a character vector), nrows(integer), comment.char(a character string), skip(integer), stringsAsFactors(logical)
2. default arguments/actions: comment.char, nrows, colClasses
(1). comment.char "#"
(2). nrows: figure out itself
(3). colClasses: figure out itself
3. reading in larger dataset with read.table
Telling R all these things directly makes R run faster and more efficienctly.
(1). Set comment.char = "" if no comments in data file
(2). Set colClasses makes reading often twice as fast.
initial <- read.table("foo.txt", nrows=100)
classes <- sapply(initial, class)
data <- read.table("foo.txt", colClasses = classes)
(3). Calculating memory requirements roughly and comparing it with RAM space.
(4). Set nrows, help with memory usage.
(5). Know the system: os, 32bit/64bit, current status(#users, #programs, available space, etc.)
4. read.table() vs. read.csv()
default sep: self figure out vs. ,
dput()/dget(), dump()/source() R objects
dput()/dget(): single R object
dump()/source(): single R object or multiple R objects
File Connections
file, gzfile, bzfile, url
readLines()/writeLines()
arguments: a character vector, each element one line
浙公网安备 33010602011771号