提问人:Marco 提问时间:2/22/2023 更新时间:2/22/2023 访问量:43
如何在 R 中保留其数值特征的同时对连续变量进行装箱?
How to bin a continuous variable while keeping its numeric feature in R?
问:
我喜欢在保留连续变量的同时将其装箱。有几个选项可以对变量中的连续变量进行自由化或分类,如下所示:numeric
numeric
factor
data(mtcars)
library(tidyverse)
mtcars <- mtcars %>% mutate(mpg_binned = cut_width(mpg, 2, closed = "right", boundary = 10))
as_tibble(mtcars %>% select(mpg, mpg_binned))
# A tibble: 32 × 2
mpg mpg_binned
<dbl> <fct>
1 21 (20,22]
2 21 (20,22]
3 22.8 (22,24]
4 21.4 (20,22]
5 18.7 (18,20]
6 18.1 (18,20]
7 14.3 (14,16]
8 24.4 (24,26]
9 22.8 (22,24]
10 19.2 (18,20]
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
但我喜欢用数字做各种图形和运算。因此,我喜欢将每个初始值转换为该区间的中心。第一个观测值仍然是 21,因为它是 (20,22) 的中间。四舍五入不起作用,因为第 7 行值 14.3 应变为 15((14,16] 的中间)。
答:
2赞
Miff
2/22/2023
#1
您可以将列拆分为数字行并取平均值,如下所示:mpg_binned
mtcars$mid <- sapply(stringr::str_extract_all(mtcars$mpg_binned,"[0-9]+"),
function(x){mean(as.numeric(x))})
评论
0赞
Marco
2/22/2023
我一直在寻找一种更直接的整洁方法,但解决方法可以完成这项工作。
1赞
Darren Tsai
2/22/2023
@Marco这种方法可以用 {tidyverse} 的意义上重写:mtcars %>% mutate(mid = map_dbl(str_extract_all(mpg_binned,"[0-9]+"), ~ mean(as.numeric(.x))))
2赞
Darren Tsai
2/22/2023
#2
您可以从 中提取下限和上限,并对它们进行平均。mpg_binned
tidyr::extract()
library(tidyverse)
mtcars %>%
extract(mpg_binned, c("low", "up"), "(\\d+),(\\d+)", remove = FALSE, convert = TRUE) %>%
mutate(mid = (low + up) / 2)
# # A tibble: 32 × 4
# mpg_binned low up mid
# <fct> <int> <int> <dbl>
# 1 (20,22] 20 22 21
# 2 (20,22] 20 22 21
# 3 (22,24] 22 24 23
# 4 (20,22] 20 22 21
# 5 (18,20] 18 20 19
# 6 (18,20] 18 20 19
# # … with 26 more rows
下一个:如何在 R 中共享变量
评论