数据分析-R语言学习笔记（四）字符串

字符串

R中提供大量处理字符串的方法，可以很方便地处理字符串的问题，同时也支持正则表达式。

在R中字符串出现的地方一定要加引号。

字符串相关操作：

> nchar("hello") #统计字符串的长度
[1] 5
> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 
> nchar(month.name)
 [1] 7 8 5 5 3 4 4 6 9 7 8 8

length和nchar区别：length返回向量中元素的个数，nchar返回的是每个元素字符串的个数。

> nchar(month.name)
 [1] 7 8 5 5 3 4 4 6 9 7 8 8
> length(month.name)
[1] 12
#接下来看一个更好的例子
> x<-"John"
> y<-c("Jim","Tony","kavry")
> x
[1] "John"
> y
[1] "Jim"   "Tony"  "kavry"
> nchar(x) #return the number of characters in the string
[1] 4
> nchar(y) #if a vector,return the length of each string
[1] 3 4 5
> length(x) #return the length of character,result=1
[1] 1
> length(y) #return the length of vector,result=3
[1] 3

paste字符串连接函数：

> paste("吴军","浪潮","之巅",sep = '-')
[1] "吴军-浪潮-之巅"
> paste (c("贺知章","李白","杜甫"),"爱写诗")
[1] "贺知章 爱写诗" "李白 爱写诗"   "杜甫 爱写诗"

substr函数提取字符串：

这里很怪，如果从0开始，那就是[0,3)，如果从1开始就是[1,3]。但是后面发现，其实默认是从1开始的，那以后不要再从0开始写了。

> substr("碧玉妆成一树高",0,3)
[1] "碧玉妆"
> substr("碧玉妆成一树高",1,3)
[1] "碧玉妆"

> substr("碧玉妆成一树高",2,3) #可见默认下标是从1开始的而且左闭右闭
[1] "玉妆"

toupper函数转大写、tolower函数转小写：

> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 
> tmp<-substr(month.name,1,3)
> tmp
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> toupper(tmp) # 转大写
 [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"

gsub和sub替换函数：正则表达式都忘了，补补

> gsub("^(\\w)","\\U\\1",tolower(tmp)) #^表示首字母；\w表示字符集简写，代表所有小写字符；\U表示转化为大写；1表示转换一次
 [1] "Ujan" "Ufeb" "Umar" "Uapr" "Umay" "Ujun" "Ujul" "Uaug" "Usep" "Uoct" "Unov"
[12] "Udec"
> gsub("^(\\w)","\\U\\1",tolower(tmp),perl = T) #^表示首字母；\w表示字符集简写，代表所有小写字符；\U表示转化为大写；1表示转换一次;perl = T支持perl类型的正则表达式
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> gsub("^(\\w)","\\L\\1",tolower(tmp),perl = T) #^表示首字母；\w表示字符集简写，代表所有小写字符；\L表示转化为小写；1表示转换一次;perl = T支持perl类型的正则表达式
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"

grep函数查找字符串，函数参数及含义如下：

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)

fixed = T匹配字符串，fixed = F则匹配表达式。

> x<-c("B","A+","ABCD")
> x
[1] "B"    "A+"   "ABCD"
> grep("A+",x,fixed = T)
[1] 2
> grep("A+",x,fixed = F)
[1] 2 3 #因为正则表达式中+是匹配1到无穷多个东西的意思

match函数匹配字符串(不支持正则表达式)：

> match("ABCD",x)
[1] 3

strsplit分割字符串，注意，返回值是一个列表而不是向量：

> path<-("/user/local/bin")
> path
[1] "/user/local/bin"
> strsplit(path,"/")
[[1]]
[1] ""      "user"  "local" "bin"

第一个参数，被分割的可以是一个向量，因此若是strsplit返回值是向量就没有办法来存储了：

> strsplit(c(path,"i/hate/U"),"/")
[[1]]
[1] ""      "user"  "local" "bin"  

[[2]]
[1] "i"    "hate" "U"

outer函数生成几个字符串的所有组合，这个就是笛卡尔积：

> face<-1:13
> suit<-c("spades","clubs","hearts","diamonds")
> face
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
> suit
[1] "spades"   "clubs"    "hearts"   "diamonds"
> outer(face,suit,FUN = paste)
      [,1]        [,2]       [,3]        [,4]         
 [1,] "1 spades"  "1 clubs"  "1 hearts"  "1 diamonds" 
 [2,] "2 spades"  "2 clubs"  "2 hearts"  "2 diamonds" 
 [3,] "3 spades"  "3 clubs"  "3 hearts"  "3 diamonds" 
 [4,] "4 spades"  "4 clubs"  "4 hearts"  "4 diamonds" 
 [5,] "5 spades"  "5 clubs"  "5 hearts"  "5 diamonds" 
 [6,] "6 spades"  "6 clubs"  "6 hearts"  "6 diamonds" 
 [7,] "7 spades"  "7 clubs"  "7 hearts"  "7 diamonds" 
 [8,] "8 spades"  "8 clubs"  "8 hearts"  "8 diamonds" 
 [9,] "9 spades"  "9 clubs"  "9 hearts"  "9 diamonds" 
[10,] "10 spades" "10 clubs" "10 hearts" "10 diamonds"
[11,] "11 spades" "11 clubs" "11 hearts" "11 diamonds"
[12,] "12 spades" "12 clubs" "12 hearts" "12 diamonds"
[13,] "13 spades" "13 clubs" "13 hearts" "13 diamonds"

posted on 2023-11-29 10:08 szdbjooo 阅读(24) 评论(0) 收藏举报

刷新页面返回顶部

四方显神

导航

公告

数据分析-R语言学习笔记（四）字符串

字符串

字符串相关操作：

四方显神

导航

公告

数据分析-R语言学习笔记（四） 字符串

字符串

字符串相关操作：

数据分析-R语言学习笔记（四）字符串