3.2 rust Storing UTF-8 Encoded Text with Strings
What Is a String?
The String type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type. When Rustaceans refer to “strings” in Rust, they usually mean the String and the string slice &str types, not just one of those types. Although this section is largely about String, both types are used heavily in Rust’s standard library, and both String and string slices are UTF-8 encoded.
Rust’s standard library also includes a number of other string types, such as OsString, OsStr, CString, and CStr.
Creating a New String
pub fn create_string(){ let _hello = String::from("السلام عليكم"); println!("{}",_hello); let _hello = String::from("Dobrý den"); let _hello = String::from("Hello"); let _hello = String::from("שָׁלוֹם"); let _hello = String::from("नमस्ते"); let _hello = String::from("こんにちは"); let _hello = String::from("안녕하세요"); let _hello = String::from("你好"); let _hello = String::from("Olá"); let _hello = String::from("Здравствуйте"); let _hello = String::from("Hola"); let _data = "wa ka ka ".to_string(); println!("{}",_data); }
The format! macro returns a String with the contents and doesn't take ownership of any of its parameters.
pub fn add_str(){ let s1 = String::from("s1 "); let s2 = String::from("s2 "); let s3 = s1 + &s2; println!("{}",s3); // println!("{}",s1); // take ownership of s1 } pub fn fmt_str(){ let s1 = String::from("2020"); let s2 = String::from("08"); let s3 = String::from("19"); // it doesn't take ownership of its parameters. let s = format!("{}-{}-{}",s1,s2,s3); println!("{}",s); println!("{}",s1); }
不允许直接通过index的方式获取字符的字符,但可以通过切换的方式str[0...len]获取
以下写法直接报语法错误
let hello = "Здравствуйте"; let answer = &hello[0];
以下写法会抛出panic,因为hello需要一次读取两个bytes,改为hello[0..2]则正确
pub fn slic_str(){ let hello = "Здравствуйте"; let answer = &hello[0..1]; println!("{}",answer); }
下面的写法是正确的
pub fn slic_str(){ let hello = "Здравствуйте"; let hello = "1"; let answer = &hello[0..1]; println!("{}",answer); }
Another point about UTF-8 is that there are actually three relevant ways to look at strings from Rust’s perspective: as bytes, scalar values, and grapheme clusters (the closest thing to what we would call letters).
If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of u8 values that looks like this:
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s char type is, those bytes look like this:
['न', 'म', 'स', '्', 'त', 'े']
There are six char values here, but the fourth and sixth are not letters: they’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:
["न", "म", "स्", "ते"]
A final reason Rust doesn’t allow us to index into a String to get a character is that indexing operations are expected to always take constant time (O(1)). But it isn’t possible to guarantee that performance with a String, because Rust would have to walk through the contents from the beginning to the index to determine how many valid characters there were.
Methods for Iterating Over Strings
pub fn slice_str(){ for c in "नमस्ते".chars() { println!("{}", c); } }
输出了6个字符
न
म
स
त
pub fn slice_strb(){ for b in "नमस्ते".bytes() { println!("{}",b); } }
输出
224 164 168 224 164 174 224 164 184 224 165 141 224 164 164 224 165 135

浙公网安备 33010602011771号