alex_bn_lee

导航

【559】R学习笔记

参考:R语言 教程 - W3C School

参考:R语言 教程 - 菜鸟教程

参考:R语言程序中的中文乱码解决办法


30 Aug, 2023

data.frame

(1) Create Data Frame

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.			
print(emp.data) 

(2) Get the Structure of the Data Frame

The structure of the data frame can be seen by using str() function.

# Get the structure of the data frame.
str(emp.data)

# output

'data.frame':   5 obs. of  4 variables:
 $ emp_id    : int  1 2 3 4 5
 $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
 $ salary    : num  623 515 611 729 843
 $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

(3) Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function.

(4) Extract Data from Data Frame

Extract specific column from a data frame using column name.

# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

# output

  emp.data.emp_name emp.data.salary
1              Rick          623.30
2               Dan          515.20
3          Michelle          611.00
4              Ryan          729.00
5              Gary          843.25

Extract the first two rows and then all columns 

# Extract first two rows.
result <- emp.data[1:2,]
print(result)

# output

  emp_id    emp_name   salary    start_date
1      1     Rick      623.3     2012-01-01
2      2     Dan       515.2     2013-09-23

Extract 3rd and 5th row with 2nd and 4th column 

# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)

# output

  emp_name start_date
3 Michelle 2014-11-15
5     Gary 2015-03-27

(5) Expand Data Frame

A data frame can be expanded by adding columns and rows.

Add Column

Just add the column vector using a new column name.

# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

# output

  emp_id   emp_name    salary    start_date       dept
1     1    Rick        623.30    2012-01-01       IT
2     2    Dan         515.20    2013-09-23       Operations
3     3    Michelle    611.00    2014-11-15       IT
4     4    Ryan        729.00    2014-05-11       HR
5     5    Gary        843.25    2015-03-27       Finance

Use the cbind() function to add new columns in a Data Frame.

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

(6) Acess Items 

We can use single brackets [ ], double brackets [[ ]] or $ to access columns from a data frame:

> emp.data[1]
  emp_id
1      1
2      2
3      3
4      4
5      5
> emp.data[1,]
  emp_id emp_name salary start_date dept
1      1     Rick  623.3 2012-01-01   IT
> emp.data[1,2]
[1] "Rick"
> emp.data[1,'emp_name']
[1] "Rick"
> emp.data['emp_name']
  emp_name
1     Rick
2      Dan
3 Michelle
4     Ryan
5     Gary
> emp.data[['emp_name']]
[1] "Rick"     "Dan"      "Michelle" "Ryan"     "Gary"    
> emp.data[1]
  emp_id
1      1
2      2
3      3
4      4
5      5
> emp.data[[1]]
[1] 1 2 3 4 5
> emp.data[1,2]
[1] "Rick"
> emp.data$emp_id
[1] 1 2 3 4 5

(7) Remove Rows and Columns

Use the c() function to remove rows and columns in a Data Frame: 

Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

# Remove the first row and column
Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame
Data_Frame_New

(8) Amount of Rows and Columns

Use the dim() function to find the amount of rows and columns in a Data Frame: 

Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

dim(Data_Frame)

You can also use the ncol() function to find the number of columns and nrow() to find the number of rows: 

Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

ncol(Data_Frame)
nrow(Data_Frame)

Use the length() function to find the number of columns in a Data Frame (similar to ncol()): 

Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

length(Data_Frame)

(9) Check if a variable is a data frame or not

We can check if a variable is a data frame or not using the class() function.

(10) Accessing like a matrix

Data frames can be accessed like a matrix by providing indexes for row and column.

To illustrate this, we use datasets already available in R. Datasets that are available can be listed with the command library(help = "datasets").

A data frame can be examined using functions like str() and head().

> emp.data
  emp_id emp_name salary start_date       dept
1      1     Rick 623.30 2012-01-01         IT
2      2      Dan 515.20 2013-09-23 Operations
3      3 Michelle 611.00 2014-11-15         IT
4      4     Ryan 729.00 2014-05-11         HR
5      5     Gary 843.25 2015-03-27    Finance

> emp.data[emp.data$salary > 600]
  emp_id salary start_date       dept
1      1 623.30 2012-01-01         IT
2      2 515.20 2013-09-23 Operations
3      3 611.00 2014-11-15         IT
4      4 729.00 2014-05-11         HR
5      5 843.25 2015-03-27    Finance

> emp.data[emp.data$salary > 600, ]
  emp_id emp_name salary start_date    dept
1      1     Rick 623.30 2012-01-01      IT
3      3 Michelle 611.00 2014-11-15      IT
4      4     Ryan 729.00 2014-05-11      HR
5      5     Gary 843.25 2015-03-27 Finance

> emp.data[emp.data$salary > 600 & emp.data$dept=='IT', ]
  emp_id emp_name salary start_date dept
1      1     Rick  623.3 2012-01-01   IT
3      3 Michelle  611.0 2014-11-15   IT

> head(emp.data, n=3)
  emp_id emp_name salary start_date       dept
1      1     Rick  623.3 2012-01-01         IT
2      2      Dan  515.2 2013-09-23 Operations
3      3 Michelle  611.0 2014-11-15         IT

> emp.data[2:3, ]
  emp_id emp_name salary start_date       dept
2      2      Dan  515.2 2013-09-23 Operations
3      3 Michelle  611.0 2014-11-15         IT

> emp.data[2:3, "salary"]
[1] 515.2 611.0

Vectors

ref: R Vectors

A vector is simply a list of items that are of the same type.

To combine the list of items to a vector, use the c() function and separate the items by a comma.

> a
[1] 1 2 3 4 5
> a[c(-1)]
[1] 2 3 4 5
> a[c(-2)]
[1] 1 3 4 5
> a[c(-3)]
[1] 1 2 4 5
> rep(c(1,2,3), each=3)
[1] 1 1 1 2 2 2 3 3 3
> rep(c(1), each=10)
 [1] 1 1 1 1 1 1 1 1 1 1
> rep(1, each=10)
 [1] 1 1 1 1 1 1 1 1 1 1
> rep(c(1,2,3), times=3)
[1] 1 2 3 1 2 3 1 2 3
> rep(1, 10)
 [1] 1 1 1 1 1 1 1 1 1 1
> rep(c(1,2,3), times = c(5,2,1))
[1] 1 1 1 1 1 2 2 3
> seq(from = 0, to = 100, by = 20)
[1]   0  20  40  60  80 100

Lists

ref: R Lists 

A list in R can contain many different data types inside it. A list is a collection of data which is ordered and changeable. 

To find out if a specified item is present in a list, use the %in% operator: 

To add an item to the end of the list, use the append() function: 

You can loop through the list items by using a for loop:

There are several ways to join, or concatenate, two or more lists in R.

The most common way is to use the c() function, which combines two elements together:

 

 

 

 

 


 说明 R Python
赋值运算符
v1 <-  c (3,1, TRUE ,2+3i)
v2 <<-  c (3,1, TRUE ,2+3i)
v3 =  c (3,1, TRUE ,2+3i)
print (v1)
print (v2)
print (v3)
=

冒号运算符。

它为向量按顺序创建一系列数字。

v <- 2:8
print (v)
v = range(2, 8)
此运算符用于标识元素是否属于向量。
v1 <- 8
v2 <- 12
t <- 1:10
print (v1 % in % t) 
print (v2 % in % t)
v1 in list1
if 语句 
x <- 30L
if ( is.integer (x)) {
   print ( "X is an Integer" )
}
 
if...else 语句 
x <-  c ( "what" , "is" , "truth" )

if ( "Truth" % in % x) {
   print ( "Truth is found" )
}  else {
   print ( "Truth is not found" )
}
 
for 循环 
v <-  LETTERS [1:4]
for  ( i  in v) {
   print (i)
}
 
     
     
     
     
  • c():向量函数,索引值从 1 开始
  • matrix():矩阵,通过 向量构造
  • > vector= c (1:10)
    > vector
     [1]  1  2  3  4  5  6  7  8  9 10
    > m =  matrix (vector, 5, 2)
    > m 
         [,1] [,2]
    [1,]    1    6
    [2,]    2    7
    [3,]    3    8
    [4,]    4    9
    [5,]    5   10
    >  length (m) 
    [1] 10
    > m[1,1]
    [1] 1
    > m[2,2]
    [1] 7
  •  
  •  

posted on 2021-05-13 19:59  McDelfino  阅读(61)  评论(0编辑  收藏  举报