提问人:Ian Andwati 提问时间:11/14/2023 最后编辑:JmbIan Andwati 更新时间:11/14/2023 访问量:46
WC 克隆不支持多字节字符
wc clone does not support multibyte character
问:
我正在尝试通过将 wc unix 工具克隆到 rust 来完成编码挑战。我在最后一步遇到了一个挑战,即支持输出文件中字符数的命令行选项 -m。如果当前语言环境不支持多字节字符,这将匹配 -c 选项。
以下是挑战说明的链接:https://codingchallenges.fyi/challenges/challenge-wc/
实现的选项输出的数量可能少于预期的数量,可能是由于区域设置问题。预期的输出是当我得到.339292 test.txt
327900
我试图用unicode来理解它,但我仍然不明白。https://learn.microsoft.com/en-us/globalization/locale/locale 和 https://tonsky.me/blog/unicode/
测试文件可在以下位置获得 https://github.com/andwati/wc-rs/blob/main/test.txt
这是我的实现。我对 Rust 很陌生,所以代码可能不是惯用的。
use std::env;
use std::fs::File;
use std::io::prelude::*;
use std::io::{self, BufReader};
fn number_of_bytes(file_path: &str) -> io::Result<()> {
let f = File::open(file_path)?;
let mut reader = BufReader::new(f);
let mut buffer = Vec::new();
// read the whole file
reader.read_to_end(&mut buffer)?;
let total_bytes = buffer.len();
println!("{} {}", total_bytes, file_path);
Ok(())
}
fn number_of_lines(file_path: &str) -> io::Result<()> {
let f = File::open(file_path)?;
let reader = BufReader::new(f);
let line_count = reader.lines().count();
println!("{} {}", line_count, file_path);
Ok(())
}
fn number_of_words(file_path: &str) {
let f = File::open(file_path).expect("Error opening the file");
let reader = BufReader::new(f);
let mut word_count: u32 = 0;
for line in reader.lines() {
let curr: String = line.expect("Error reading content of the file");
// let words: Vec<&str> = curr.split(" ").collect();
let words: Vec<&str> = curr.split_whitespace().collect();
let filtered_words: Vec<&str> = words.into_iter().filter(|word| word.len() > 0).collect();
word_count += filtered_words.len() as u32
}
println!("{}", word_count);
}
fn number_of_characters(file_path: &str) {
let mut file = File::open(file_path).unwrap();
let mut s = String::new();
file.read_to_string(&mut s).unwrap();
print!("{}", s.trim_end().chars().count());
}
fn main() {
let args: Vec<String> = env::args().collect();
let file_path = &args[2];
if args.len() > 1 && args[1] == "-c" {
number_of_bytes(file_path).unwrap();
} else if args.len() > 1 && args[1] == "-l" {
number_of_lines(file_path).unwrap();
} else if args.len() > 1 && args[1] == "-w" {
number_of_words(&file_path);
} else if args.len() > 1 && args[1] == "-m" {
number_of_characters(file_path);
} else {
eprintln!("Usage: wc-tool -c <filepath>");
std::process::exit(1);
}
}
我试图用unicode来理解它,但我仍然不明白。https://learn.microsoft.com/en-us/globalization/locale/locale 和 https://tonsky.me/blog/unicode/。
答:
0赞
Ian Andwati
11/14/2023
#1
我能够通过实现以下功能来获得准确的读数
fn number_of_characters(file_path: &str) {
let mut file = File::open(file_path).unwrap();
let mut s = String::new();
file.read_to_string(&mut s).unwrap();
print!("{}", s.chars().count());
}
评论
number_of_characters()
.as_bytes()
line?.chars().count()
wc