提问人:Xavier Villà Aguilar 提问时间:9/14/2023 更新时间:9/14/2023 访问量:26
Mean() 不使用函数作为饼图中的参数
Mean() not working with functions as arguments within a pie
问:
我是 R 的初学者。我正在使用 dplyr 进行一些关于数据操作的练习,我遇到了一些我不太明白的东西。
我正在进行一个练习,该练习使用 tidyverse 中的“泰坦尼克号”训练数据集。该练习是关于将每个人的性别平均值归因于所有具有其年龄 NA 的乘客观察结果。
有效的代码片段是这样的:
`#' Exercise 6.5
#' Use the case_when function to create a new column in the titanic dataset called imputed_age_of_passenger.
#' In this column we should have, wherever the value of the sex_of_passenger is “male” and age of passenger value is
#'missing the imputed value should be the mean age_of_passenger of only the male passengers. wherever the value
#'sex_of_passenger is “female” and age of passenger value is missing the imputed value should be imputed with the mean
#'age_of_passenger of only the female passengers. Otherwise, take the value of the age_of_passenger.
mean_age_male <- titanic %>%
filter(sex_of_passenger == "male") %>%
pull(age_of_passenger) %>%
mean(na.rm = TRUE)
mean_age_female <- titanic %>%
filter(sex_of_passenger == "female") %>%
pull(age_of_passenger) %>%
mean(na.rm = TRUE)
titanic <- titanic %>%
mutate(imputed_age_of_passenger = case_when(
sex_of_passenger == "male" & is.na(age_of_passenger) ~ mean_age_male,
sex_of_passenger == "female" & is.na(age_of_passenger) ~ mean_age_female,
TRUE ~ age_of_passenger))`
我试图使代码更流畅、更简短。我认为为内存中按性别划分的平均值创建两个辅助变量没有任何意义,除了在数据集上创建新列外,我不需要这些辅助变量。因此,我尝试在一根管道中完成所有操作,如下所示:
titanic <- titanic %>%
mutate(imputed_age_of_passenger = case_when(
sex_of_passenger == "male" & is.na(age_of_passenger) ~ mean(filter(sex_of_passenger == "male")$age_of_passenger, na.rm = TRUE),
sex_of_passenger == "female" & is.na(age_of_passenger) ~ mean(filter(sex_of_passenger == "female")$age_of_passenger, na.rm = TRUE)
TRUE ~ age_of_passenger))
但是,我收到以下错误:
Error: unexpected numeric constant in: " sex_of_passenger == "female" & is.na(age_of_passenger) ~ mean(filter(sex_of_passenger == "female")$age_of_passenger, na.rm = TRUE) TRUE"
同样,我尝试通过将 pull() 和 mean() 函数组合在一行中而不是作为管道中的序列来简化两个辅助变量的定义,如下所示:
mean_age_male <- titanic %>%
filter(sex_of_passenger == "male") %>%
mean(pull(age_of_passenger), na.rm = TRUE)
但是,虽然上述方法确实有效,但它将 NA 存储到该mean_age_male变量中,并显示以下警告:
Warning message:
In mean.default(., na.rm = TRUE) :
argument is not numeric or logical: returning NA
有人可以告诉我为什么上面的片段都没有按预期工作吗?提前致谢!
答:
对于第一个代码,我认为您在分配值时尝试过滤数据的方式有问题。
您可以尝试如下操作 df[“要执行均值计算的列的列名”][df[“过滤器的列名”] == “过滤器值”]
titanic <- titanic %>%
mutate(imputed_age_of_passenger = case_when(
sex_of_passenger == "male" & is.na(age_of_passenger) ~ mean(titanic["age_of_passenger][titanic["sex_of_passenger"] == "male"], na.rm = TRUE),
sex_of_passenger == "female" & is.na(age_of_passenger) ~ mean(titanic["age_of_passenger][titanic["sex_of_passenger"] == "female"], na.rm = TRUE)
TRUE ~ age_of_passenger))
如果您想避免保存平均年龄的值,您可以做的是计算平均年龄,然后分配所需的值,如下所示
titanic %>%
group_by(sex_of_passenger) %>%
summarise( Avg_Age = mean(age_of_passenger, na.rm = TRUE))
现在,您将获得两个平均年龄值。
您可以手动分配值,如下所示
titanic["imputed_age_of_passenger"][is.na(titanic["age_of_passenger"]) & titanic["sex_of_passenger"] == "male"] <- first value
titanic["imputed_age_of_passenger"][is.na(titanic["age_of_passenger"]) & titanic["sex_of_passenger"] == "female"] <- second value
评论
Error: unexpected numeric constant in: " sex_of_passenger == "female" & is.na(age_of_passenger) ~ mean(titanic["age_of_passenger"][titanic["sex_of_passenger"] == "female"], na.rm = TRUE) TRUE"
评论