在将字符变量转换为整数时,有一条消息说:强制引入的 NA。如何避免此错误?

While converting character variable into integer, there is a message saying : NAs introduced by coercion. How to avoid this error?

提问人:Sri Sreshtan 提问时间:5/15/2020 更新时间:5/15/2020 访问量:350

问:

我尝试使用函数将字符变量转换为整数变量。但是,在执行代码时,输出将返回值为 。代码如下:as.integerNA

library(tidyverse)
coal_data <- read.csv("http://594442.youcanlearnit.net/coal.csv", skip = 2)
coal_data %>% glimpse()
colnames(coal_data)[1] <- "region"
coal_long <- gather(coal_data, 'year', 'coal_consumption', -region)
coal_long %>% glimpse()
coal_long %>% separate(year, into = c("x", "year"), sep = "X")%>%
    select(-x)%>% glimpse()
class(coal_long$year)
coal_long$year <- as.integer(coal_long$year)

输出如下

coal_long %>% glimpse()



 Rows: 6,960
    Columns: 3
    $ region           <fct> "North America", "Bermuda", "Canada", "Greenland", "Mexico",...
    $ year             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
    $ coal_consumption <chr> "16.45179", "0", "0.96156", "0.00005", "0.10239", "0", "15.3...

预期的实际产出以整数形式获得这一年。 非常感谢您提前调查此事。

r dplyr 类型转换 整数 na

评论


答:

1赞 Jeff Bezos 5/15/2020 #1

在转换为整数之前,您需要删除这些字母。试试这样的事情。coal_long$year

coal_long$year # X1980 X1981 X1982 X1983, etc.
as.integer(str_remove(coal_long$year, "X"))

这是一种更通用的方法,在转换之前从字符串中提取所有数字。

as.integer(str_extract(coal_long$year, "\\d+"))

评论

0赞 Sri Sreshtan 5/15/2020
感谢您提供代码。您提供的代码有助于获得这一年。但是,coal_long$year 变量的类仍然是字符。它没有被更改为整数。
0赞 Jeff Bezos 5/16/2020
您是否重新分配了 coal_long$year?当我这样做时,我得到整数class(as.integer(str_remove(coal_long$year, "X")))
0赞 Sri Sreshtan 5/16/2020
是的,先生,它在分配 coal_long$ 年后起作用了。谢谢。
2赞 thorepet 5/15/2020 #2

删除列中的后,您需要重新分配。coal_longXyear

coal_long <- coal_long %>% 
  separate(year, into = c("x", "year"), sep = "X") %>% 
  select(-x) %>% 
  glimpse()

coal_long$year <- as.integer(coal_long$year)

coal_long %>% glimpse()

Rows: 6,960
Columns: 3
$ region           <fct> "North America", "Bermuda", "Canada", "Greenland", "Mexico", "Saint Pierre and Miquelon", "United States", "Cent…
$ year             <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980…
$ coal_consumption <chr> "16.45179", "0", "0.96156", "0.00005", "0.10239", "0", "15.38779", "0.42011", "0", "0", "0.03476", "--", "0", "0…
3赞 Chuck P 5/15/2020 #3

不妨在你做的时候coal_consumption加倍......

library(tidyverse)

coal_data <- read.csv("http://594442.youcanlearnit.net/coal.csv", skip = 2, na.strings = "--")

colnames(coal_data)[1] <- "region"
coal_long <- gather(coal_data, 'year', 'coal_consumption', -region)
coal_long %>% glimpse()
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <chr> "X1980", "X1980", "X1980", "X1980", "X1980", "X1980"…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…
coal_long <- coal_long %>% separate(year, into = c("x", "year"), sep = "X") %>%
  select(-x) %>% glimpse()
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <chr> "1980", "1980", "1980", "1980", "1980", "1980", "198…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…
class(coal_long$year)
#> [1] "character"
coal_long$year <- as.integer(str_remove(coal_long$year, "X"))
glimpse(coal_long)
#> Rows: 6,960
#> Columns: 3
#> $ region           <chr> "North America", "Bermuda", "Canada", "Greenland", "…
#> $ year             <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980…
#> $ coal_consumption <dbl> 16.45179, 0.00000, 0.96156, 0.00005, 0.10239, 0.0000…

评论

1赞 Sri Sreshtan 5/15/2020
非常感谢您提供代码。成功了。
0赞 Chuck P 5/15/2020
没问题,乐意帮忙