如何在 rust 中切片具有 utf8 的字符串-解网

问：

我正在编写一个 rust 玩具解析器，我想在我的字符串输入中处理 UTF-8 字符。我知道我需要使用方法获取 UTF-8 迭代器才能正确获取 UTF-8 字符，但我想使用 UTF-8 索引对字符串进行切片。有什么方法可以用吗？我研究了 SWC，但我可以理解它如何处理 UTF-8 字符串，因为似乎输入 api 需要开发自我来处理正确的 UFT-8 索引。chars

use swc_common::input::{StringInput, Input};
use swc_common::BytePos;
fn main() {
    let utf8_str = "中文字串";
    let mut input =  StringInput::new("中文字串", BytePos(0), BytePos(utf8_str.len().try_into().unwrap()));
    println!("{:?}", input.slice(BytePos(0), BytePos(3)));
    println!("{:?}", &utf8_str[0..3]);
   // is there any function like slice(start_usize, end_usize) can get utf-8 string 
}

字符串锈 UTF-8

嗨，@cafce25，我想使用 UTF-8 索引而不是字节索引对 UTF-8 字符串进行切片，上面的例子我使用字节索引。所以我需要知道当前字节是 8 位或更多位，但我希望能够索引字符串 like 并获取迭代器的前 2 个 UTF-8 字符 call twice 方法slice(0, 2)next

2赞 Jmb 7/24/2023

这是不可能做到的，因为 UTF-8 字符的大小是可变的。您可以使用 char_indices（）.nth （n） 来获取第 th 个字符的起始位置的字节索引。O(1)O(n)n

答：

1赞 cafce25 7/24/2023 #1

不支持使用字符索引进行切片，并且由于特征是密封的，因此无法实现它。但是您可以使用 char_indices 来计算每个 utf8 字符的相应字节索引：SliceIndex

fn main() {
    let utf8_str = "中文字串";
    let start_char = 1;
    let end_char = 2;
    let mut indices = utf8_str.char_indices().map(|(i, _)| i);
    let start = indices.nth(start_char).unwrap();
    let end = indices.nth(end_char - start_char - 1).unwrap_or(utf8_str.len());
    println!("{:?}", &utf8_str[start..end]);
}

输出：

"文"

如何在 rust 中切片具有 utf8 的字符串

How to slice a string has utf8 in rust

评论

评论