http://wiki.stdout.org/rcookbook/Data%20input%20and%20output/

 

Loading data from a file

Table of contents

Problem

You want to load data from a file.

Solution

Delimited text files

The simplest way to import data is to save it as a text file with delimiters such as tabs or commas (CSV).

data <- read.csv("datafile.csv")
 
# Load a CSV file that doesn't have headers
data <- read.csv("datafile-noheader.csv", header=FALSE)

The function read.table() is a more general function which allows you to set the delimiter, whether or not there are headers, whether strings are set off with quotes, and more. See ?read.table for more information on the details.

data <- read.table("datafile-noheader.csv",
                   header=FALSE,
                   sep= ","         # use "\t" for tab-delimited files
                   )

Loading a file with a file chooser

On some platforms, using file.choose() will open a file chooser dialog window. On others, it will simply prompt the user to type in a filename.

data <- read.csv(file.choose())

Treating strings as factors or characters

By default, strings in the data are converted to factors. If you load the data below with read.csv, then all the text columns will be treated as factors, even though it might make more sense to treat some of them as strings. To do this, use stringsAsFactors=FALSE:

data <- read.csv("datafile.csv", stringsAsFactors=FALSE)
 
# You might have to convert some columns to factors
data$Sex <- factor(data$Sex)

Another alternative is to load them as factors and convert some columns to characters:

data <- read.csv("datafile.csv")
 
data$First <- as.character(data$First)
data$Last  <- as.character(data$Last)
 
# Another method: convert columns named "First" and "Last"
stringcols <- c("First","Last")
data[stringcols] <- lapply(data[stringcols], as.character)

Loading a file from the Internet

Data can also be loaded from a URL. These (very long) URLs will load the files linked to below.

data <- read.csv("http://wiki.stdout.org/rcookbook/Data%20input%20and%20output/Loading%20data%20from%20a%20file/datafile.csv")
 
# Read in a CSV file without headers
data <- read.csv("http://wiki.stdout.org/rcookbook/Data%20input%20and%20output/Loading%20data%20from%20a%20file/datafile-noheader.csv", header=FALSE)
 
# Manually assign the header names
names(data) <- c("First","Last","Sex","Number")

The data files used above:

datafile.csv:

"First","Last","Sex","Number"
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21

datafile-noheader.csv:

"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21

Fixed-width text files

Suppose your data has fixed-width columns, like this:

  First     Last  Sex Number
 Currer     Bell    F      2
    Dr.    Seuss    M     49
    ""   Student   NA     21

One way to read it in is to simply use read.table() with strip.white=TRUE, which will remove extra spaces.

read.table("clipboard", header=TRUE, strip.white=TRUE)

However, your data file may have columns containing spaces, or columns with no spaces separating them, like this, where the scores column represents six different measurements, each from 0 to 3.

subject  sex  scores
   N  1    M  113311
   NE 2    F  112231
   S  3    F  111221
   W  4    M  011002

In this case, you may need to use the read.fwf() function. If you read the column names from the file, it requires that they be separated with a delimiter like a single tab, space, or comma. If they are separated with multiple spaces, as in this example, you will have to assign the column names directly.

# Assign the column names manually
read.fwf("myfile.txt", 
         c(7,5,-2,1,1,1,1,1,1), # Width of the columns. -2 means drop those columns
         skip=1,                # Skip the first line (contains header here)
         col.names=c("subject","sex","s1","s2","s3","s4","s5","s6"),
         strip.white=TRUE)      # Strip out leading and trailing whitespace when reading each
# subject sex s1 s2 s3 s4 s5 s6
#    N  1   M  1  1  3  3  1  1
#    NE 2   F  1  1  2  2  3  1
#    S  3   F  1  1  1  2  2  1
#    W  4   M  0  1  1  0  0  2
 
# If the first row looked like this:
# subject,sex,scores
# Then we could use header=TRUE:
read.fwf("myfile.txt", c(7,5,-2,1,1,1,1,1,1), header=TRUE, strip.white=TRUE)

Excel files

The read.xls function in the gdata package can read in Excel files.

library(gdata)
data <- read.xls("data.xls")

See http://cran.r-project.org/doc/manuals/R-data.html#Reading-Excel-spreadsheets.

SPSS data files

The read.spss function in the foreign package can read in SPSS files.

library(foreign)
data <- read.spss("data.sav", to.data.frame=TRUE)

 

 

Loading and storing data with the keyboard and clipboard

Table of contents

Problem

You want to enter data using input from the keyboard (not a file).

Solution

Data input

Suppose this is your data:

    size weight cost
  small      5    6
 medium      8   10
  large     11    9

Loading data from keyboard input or clipboard

One way enter from the keyboard is to read from standard input (stdin()).

# Cutting and pasting using read.table and stdin()
data <- read.table(stdin(), header=TRUE) 
# You will be prompted for input; copy and paste text here
 
# Or:
# data <- read.csv(stdin())

You can also load directly from the clipboard:

# First copy the data to the clipboard
data <- read.table('clipboard', header=TRUE)
 
# Or:
# data <- read.csv('clipboard')

Loading data in a script

The previous method can't be used to load data in a script file because the input must be typed (or pasted) after running the command.

To load the data in a script, use textConnection().

# Using read.table() and textConnection()
data <- read.table(header=TRUE, con <- textConnection('
    size weight cost
   small      5    6
  medium      8   10
   large     11    9
 '))
close(con)

For different data formats (e.g., comma-delimited, no headers, etc.), options to read.table() can be set. See../Loading data from a file for more information.

Data output

By default, R prints row names. If you want to print the table in a format that can be copied and pasted, it may be useful to suppress them.

print(data, row.names=FALSE)
#    size weight cost
#  small      5    6
# medium      8   10
#  large     11    9

Writing data for copying and pasting, or to the clipboard

It is possible to write delimited data to terminal (stdout()), so that it can be copied and pasted elsewhere. Or it can be written directly to the clipboard.

write.csv(data, stdout(), row.names=FALSE)
# "size","weight","cost"
# "small",5,6
# "medium",8,10
# "large",11,9
 
# Write to the clipboard (does not work on Mac or Unix)
write.csv(df, 'clipboard', row.names=FALSE)

Output for loading in R

If the data has already been loaded into R, the data structure can be saved using dput(). The output from dput()is a command which will recreate the data structure. The advantage of this method is that it will keep any modifications to data types; for example, if one column consists of numbers and you have converted it to a factor, this method will preserve that type, whereas simply loading the text table (as shown above) will treat it as numeric.

# Suppose you have already loaded data
 
dput(data)
# This returns:
# structure(list(size = structure(c(3L, 2L, 1L), .Label = c("large", 
# "medium", "small"), class = "factor"), weight = c(5L, 8L, 11L
# ), cost = c(6L, 10L, 9L)), .Names = c("size", "weight", "cost"
# ), class = "data.frame", row.names = c(NA, -3L))
 
# Later, we can use the output from dput to recreate the data structure
newdata <- structure(list(size = structure(c(3L, 2L, 1L), .Label = c("large", 
  "medium", "small"), class = "factor"), weight = c(5L, 8L, 11L
  ), cost = c(6L, 10L, 9L)), .Names = c("size", "weight", "cost"
  ), class = "data.frame", row.names = c(NA, -3L))

 

 

Running a script

Problem

You want to run R code from a text file.

Solution

# First, go to the proper directory
setwd('/home/username/desktop/rcode')
 
source('analyze.r')

Note that if you want your script to produce text output, you must use the print() or cat() function.

x <- 1:10
 
# In a script, this will do nothing
x
 
# Use the print function:
print(x)
# [1]  1  2  3  4  5  6  7  8  9 10
 
# Simpler output: no row/column numbers, no text wrapping
cat(x)
# 1  2  3  4  5  6  7  8  9 10

 

 

Writing data to a file

Problem

You want to write data to a file.

Solution

Writing to a delimited text file

The easiest way to do this is to use write.csv(). By default, write.csv() includes row names, but these are usually unnecessary and may cause confusion.

# A sample data frame
data <- read.table(header=T, con <- textConnection('
 subject sex size
       1   M    7
       2   F    NA
       3   F    9
       4   M   11
 '))
close(con)
 
# Write to a file, suppress row names
write.csv(data, "data.csv", row.names=FALSE)
 
# Same, except that instead of "NA", output blank cells
write.csv(data, "data.csv", row.names=FALSE, na="")
 
# Use tabs, suppress row names and column names
write.table(data, "data.csv", sep="\t", row.names=FALSE, col.names=FALSE) 

Saving in R data format

Using write.csv() and write.table() will not preserve special attributes of the data structures, such as whether a column is a character type or factor, or the order of levels in factors. In order to do that, it should be written out in a special format for R.

# Save in a text format that can be easily loaded in R
dump("data", "data.Rdmpd")
# To load the data again: 
source("data.Rdmpd")
 
# Saving in R binary format which is more compact
save("data", file="data.RData")
# To load the data again:
load("data.RData")

Saving in SPSS format

 

 

Writing text and output from analyses to a file

Problem

You want to write output to a file.

Solution

The sink() function will redirect output to a file instead of to the R terminal. Note that if you use sink() in a script and it crashes before output is returned to the terminal, then you will not see any response to your commands. Call sink() without any arguments to return output to the terminal.

# Start writing to an output file
sink('analysis-output.txt')
 
set.seed(12345)
x <-rnorm(10,10,1)
y <-rnorm(10,11,1)
# Do some stuff here
cat (sprintf("x has %d elements:\n", length(x)))
print(x)
cat ("y =", y, "\n")
 
cat("=============================\n")
cat("T-test between x and y\n")
cat("=============================\n")
t.test(x,y)
 
# Stop writing to the file
sink()
 
# Append to the file
sink('analysis-output.txt', append=TRUE)
cat("Some more stuff here...\n")
sink()

The contents of the output file:

x has 10 elements:
 [1] 10.585529 10.709466  9.890697  9.546503 10.605887  8.182044 10.630099
 [8]  9.723816  9.715840  9.080678
y = 10.88375 12.81731 11.37063 11.52022 10.24947 11.8169 10.11364 10.66842 12.12071 11.29872 
=============================
T-test between x and y
=============================
 
    Welch Two Sample t-test
 
data:  x and y 
t = -3.8326, df = 17.979, p-value = 0.001222
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -2.196802 -0.641042 
sample estimates:
mean of x mean of y 
 9.867056 11.285978 
 
Some more stuff here...