R 文本挖掘示例在 iconv 上失败

R text mining example fail on iconv

提问人:elbillaf 提问时间:3/30/2023 更新时间:3/30/2023 访问量:20

问:

我在 Window 上使用 R4.2.2,我正在尝试在这里完成文本挖掘示例: https://medium.com/@SAPCAI/text-clustering-with-r-an-introduction-for-data-scientists-c406e7454e76

它在此行上失败:

corpus.cleaned <- tm::tm_map(corpus, function(x) iconv(x, to='UTF-8-MAC', sub='byte')) 

有错误消息:

Error in iconv(x, to = "UTF-8-MAC", sub = "byte") : unsupported conversion from '' to 'UTF-8-MAC' in codepage 65001

我不能忽略它,因为它会在示例的后续代码中引起问题:

tdm <- tm::DocumentTermMatrix(corpus.cleaned)
Error in tolower(txt) : invalid input 'RT @MelindaBeckWSJ: ACA has expand coverag to millions, but what that means--and what it has cost--vari widely. We tell 10 stories-141�' in 'utf8towcs'

如何解决此UTF-8-MAC问题?

R UTF-8 TM 语料库

评论

1赞 JosefZ 3/30/2023
“utf-8-mac” OSX 文件系统使用的 UTF-8 变体。在 Windows 中,使用 .to='UTF-8'

答: 暂无答案