Golang基础-Runes

rune与string

The rune type in Go is an alias for int32. Given this underlying int32 type, the rune type holds a signed 32-bit integer value. However, unlike an int32 type, the integer value stored in a rune type represents a single Unicode character.

myRune := '¿'
fmt.Printf("myRune type: %T\n", myRune)
// Output: myRune type: int32

fmt.Printf("myRune value: %v\n", myRune)
// Output: myRune value: 191

fmt.Printf("myRune Unicode character: %c\n", myRune)
// Output: myRune Unicode character: ¿

fmt.Printf("myRune Unicode code point: %U\n", myRune)
// Output: myRune Unicode code point: U+00BF

myRune := rune(0xbf)
myRune = 191
fmt.Printf("myRune Unicode character: %c\n", myRune)
// Output: myRune Unicode character: ¿

一个字符串是一个不可改变的字节序列。字符串可以包含任意的数据,包括byte值0,但是通常是用来包含人类可读的文本。文本字符串通常被解释为采用UTF8编码的Unicode码点 (rune)序列。
内置的len函数可以返回一个字符串中的字节数目(不是rune字符数目),索引操作s[i]返回第i 个字节的字节值,i必须满足0 ≤ i< len(s)条件约束。如果试图访问超出字符串索引范围的字节将会导致panic异常。
第i个字节并不一定是字符串的第i个字符,因为对于非ASCII字符的UTF8编码会要两个或多个 字节。

  • 一个字节8bit
  • int32是4个字节
  • UTF8每个文字的字节长度是变化的(1到4)
  • range遍历string的每个文字,而不是byte
  • 单引号是rune(一个字符,byte,int32),双引号是string(字符串)
myString := "❗hello"
for index, char := range myString {
  fmt.Printf("Index: %d\tCharacter: %c\t\tCode Point: %U\n", index, char, char)
}
// Output:
// Index: 0	Character: ❗		Code Point: U+2757
// Index: 3	Character: h		Code Point: U+0068
// Index: 4	Character: e		Code Point: U+0065
// Index: 5	Character: l		Code Point: U+006C
// Index: 6	Character: l		Code Point: U+006C
// Index: 7	Character: o		Code Point: U+006F

常用函数

UTF8解码

import "unicode/utf8"

for i := 0; i < len(s); {
    r, size := utf8.DecodeRuneInString(s[i:])
    fmt.Printf("%d\t%c\n", i, r)
    i += size
}

字符串的字节长度和文字长度

import "unicode/utf8"

myString := "❗hello"
stringLength := len(myString)
numberOfRunes := utf8.RuneCountInString(myString)

fmt.Printf("myString - Length: %d - Runes: %d\n", stringLength, numberOfRunes)
// Output: myString - Length: 8 - Runes: 6

类型转换

myRuneSlice := []rune{'e', 'x', 'e', 'r', 'c', 'i', 's', 'm'}
myString := string(myRuneSlice)
fmt.Println(myString)
// Output: exercism

myString := "exercism"
myRuneSlice := []rune(myString)
fmt.Println(myRuneSlice)
// Output: [101 120 101 114 99 105 115 109]
  • string是紧凑的不可改变到字节序列
  • 所谓到紧凑就是英文1字节,中文3字节等,紧凑排列
  • rune是int32的别名,把每个字符统一扩展到4字节
  • rune是单个字符,多个应使用rune slice,相当于int32的数组

Exercise

package logs

import "unicode/utf8"

// Application identifies the application emitting the given log.
func Application(log string) string {
	for _, c := range log {
		if c == '❗' {
			return "recommendation"
		} else if c == '🔍' {
			return "search"
		} else if c == '☀' {
			return "weather"
		}
	}
	return "default"
}

// Replace replaces all occurrences of old with new, returning the modified log
// to the caller.
func Replace(log string, oldRune, newRune rune) string {
	res := ""
	for _, c := range log {
		if c == oldRune {
			res += string(newRune)
		} else {
			res += string(c)
		}
	}
	return res
}

// WithinLimit determines whether or not the number of characters in log is
// within the limit.
func WithinLimit(log string, limit int) bool {
	return utf8.RuneCountInString(log) <= limit
}
posted @ 2023-02-20 11:47  roadwide  阅读(69)  评论(0)    收藏  举报