提问人:Hack-R 提问时间:10/2/2017 更新时间:12/11/2019 访问量:369
在 Windows 上将 UTF-8 中文字符从 R 写入 CSV
Write UTF-8 Chinese characters from R to a CSV on Windows
问:
我能够像这样抓取和显示汉字:
pacman::p_load(rvest, stringr)
Sys.setlocale("LC_CTYPE", locale="Chinese")
cheese <- read_html("http://www.kekenet.com/read/story/cheese/")
links <-
cheese %>%
html_node(".box") %>%
html_nodes("li") %>%
html_nodes("a")
links <- links[grepl("read",links)]
url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
links <- str_extract(links, url_pattern)
links <- links[grepl(".shtml",links)]
text <- data.frame(english=as.character(), chinese=as.character(), stringsAsFactors = F)
for(u in links){
cheese <- read_html(u)
tmp0 <-
cheese %>%
html_nodes(xpath = '//*[@class="qh_en"]') %>%
html_text()
tmp0 <- gsub('\"', "", tmp0, fixed = TRUE)
tmp1 <-
cheese %>%
html_nodes(xpath = '//*[@class="qh_zg"]') %>%
html_text()
tmp1 <- gsub('\"', "", tmp1, fixed = TRUE)
if(length(tmp0)==length(tmp1)){
tmp <- data.frame(english=tmp0, chinese=tmp1, stringsAsFactors = F)
text <- rbind(text, tmp)
}
}
但是,即使我设置了文件编码,当我像这样将其保存为 CSV 时
write.csv(text, "english_chinese_test.csv",fileEncoding = "UTF-8")
生成的文件具有如下值,而不是中文字符:
2à ̈âà ̈±¤o¥à o1à ̈Â·ï¿ â€²Ë‰Ã ̈ˉ′3ˉo¤§¤°ooÃ ï¿ ̈3 ̈é©©©̈§1|à ̈a·“±©̈3ˉà ̈§ˉ1à ̈a·±aooà ̈ˉ′à ̈â§1ˉaooé奔¡Â°Ã© |-¤à ̈oà £Á°Ã oo ̈à 1à ̈1é©©̈·ˉ1Ï©¿¡aoo·± ̈â¿Â±Ë‰Ã ̈ˉà ̈ˉ′o1À²o±Ã ̈Â§Â±Ã ï¿ o- ̈′oà £1oo°¥é©©©©©©©aoï¿©¥Q1/41·o 'ˉÔë‰â€²Î1/4ˉo1à ′o±1ooâ¤Çé©̈â·± ̈ï¿ oo
" y¬äº†è¿™ç•ªè ̄ï1/4Œæ‰€æœ‰úºéƒ1/2想到了自己皓生æ ́»ï1/4Œå§å¶å®®‰é™̤ ̧‹æ¥ã€' to'ƒï1/4Œæ°è¥¿å¡æ ̧...了æ ̧...to—“yï1/4Œæ‰”ç ́了to®é™ï1/4Œå§å¶å¥1/2åƒéƒ1/2åœ ̈è°ˆè ºè‡ªå·±çš“å·¥ä1/2œï1/4Œä1/2†æ ̃ ̃ ̄æˆ'å¬åˆ®°äº†è®¿™ä ̧ªæ•...事以åŽï1/4Œå ́想到ú†æˆ'çš“ä ̧ªúºç”Ÿæ ́»ã€'æˆ'觉å3/4—æˆ'ç›oh‰®çš“æƒ...å†μï1/4Œæˆ'çš“y®...³ç³»ï1/4Œå°±åƒä ̧€ä ̧ª'旧奶é...ª'ï1/4Œä ̧Šé¢é•¿æ»¡äº†éœ‰èŒã€' æŸ ̄ç'žç¬'出oh°æ¥ï1/4Œè¡ ̈äºúèμžåŒï1/4šæˆ'也æ ̃ ̄ã€'æˆ'也è ̧æˆ'çŽ°åœ ̈最è ̄¥é‡‡å–è¡ŒŠ ̈çš“oh±æ ̃ ̄è®®©ä ̧€æ μä ̧æ®”‰å¿“çš”å...³ç³»y°1/2y«è¿‡yŽ»â€' a®‰æ°æ‹‰åé³é“ï1/4šæˆ'ä ̧åŒæ”ä1/2 çš“è§'ç'¹ï1/4Œä¹Ÿè ̧è¿ä ̧ª'旧奶é...ª'åªæ ̃ ̄ä ̧€ç§æ—§çš”è¡Œä ̧ºæ–¹å1/4ã€'æˆ'们需è¦æ“3/4å1/4ƒçš”yªæ ̃ ̃ ̃ ̧ºæ–¹æ–¹æ �å1/4•èμ·è¿ç状å†μ皓旧皔行ä ̧ºæ–¹å1/4ï1/4Œè€Œä ̧æ ̃ ̄è¿ä ̧ª'奶é©...ª'â€'访™™™™æˆ'们æ‰ä1/4šæœæ› ́å¥1/2çš“æ€ç” ́å'Œè¡Œä ̧ºæ–¹å1/4è1/2¬å ̃â€'
答: 暂无答案
评论