shirley_cst

博观而约取,厚积而薄发;淡泊以明志,宁静以致远。
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Overview and History of R

R vs. S

Introduction to the R Language

Data Types and Basic Operations

1. Data Types

  (1) Five basic/atomic classes of objects: numeric, integer, complex, character, logic

  (2) Objects

    a. vector vs. list

    b. matrice vs. data.frame

    c. factors

  (3) Numbers

    a. numeric vs. integer

    b. Inf, NaN

  (4) Attributes: names/dimnames, class, dimensions, class, length, other user-defined attributes/metadata

2. Operations

  (1) <- assignment

  (2) : create integer sequences

  (3) [], [[]], $ subset

    a. subset a vector: regular, logical expression/vector

    b. subset matrix: 1 vector vs. 1*1 matrix(drop), certain element/row/column

    c. subset list: [ vs [[, subset multiple elements, computable names, subset nested element, partial matching

  (4) remove NA values

    a. is.na()+!+[]

    b. complete.cases()+[]

    c. complete.cases() for matrix ???

3. Functions

  attributes(), class, dim(), names(), dimnames(), levels()

  print(), table()

  vector(), c(), matrix(), read.csv(), read.table(), list()

  cbind(), rbind()

  as.numeric(), as.logical(), as.character(), as.complex(), as.integer()

4. Missing Values

  Na, NaN: is.na(), is.nan()

Vectorized Operations

1. +  *  /

2. >  >=  ==

3. %*%

Reading and Writing Data

reading/writing files
Reading File Writing File Note
read.table(), read.csv() write.table() tabular data
readLine() writeLines() lines of a text file
source() dump() R code files
dget() dput() R code files
load() save() workspaces
unserialize() serialize() R objects in binary form

 

 

 

 

 

 

Reading data files with read.table()

1. arguments

  file, header(logical), sep(a character string), colClasses(a character vector), nrows(integer), comment.char(a character string), skip(integer), stringsAsFactors(logical)

2. default arguments/actions: comment.char, nrows, colClasses

  (1). comment.char "#"

  (2). nrows: figure out itself

  (3). colClasses: figure out itself

3. reading in larger dataset with read.table

  Telling R all these things directly makes R run faster and more efficienctly.

  (1). Set comment.char = "" if no comments in data file

  (2). Set colClasses makes reading often twice as fast.

      initial <- read.table("foo.txt", nrows=100)

      classes <- sapply(initial, class)

      data <- read.table("foo.txt", colClasses = classes)

  (3). Calculating memory requirements roughly and comparing it with RAM space.

  (4). Set nrows, help with memory usage.

  (5). Know the system: os, 32bit/64bit, current status(#users, #programs, available space, etc.)

4. read.table() vs. read.csv()

  default sep: self figure out vs. ,

dput()/dget(), dump()/source() R objects

  dput()/dget(): single R object

  dump()/source(): single R object or multiple R objects

File Connections

  file, gzfile, bzfile, url

readLines()/writeLines()

  arguments: a character vector, each element one line