提问人:AnnePauline 提问时间:5/6/2022 最后编辑:AnnePauline 更新时间:9/15/2022 访问量:41
使用 ifelse 复制 R 中的列时 R 不一致
R incongruity when copying a column in R with ifelse
问:
在加载多个工作簿的大量 xlsx 工作表后,我想对数据源的整洁度和清洁度进行双重检查。我创建了一个数据框,将工作表读取为.numeric(具有强制引入NA的风险)和另一个数据框,按原样读取数据(具有没有数字值的风险)。这是为了面对它们,并找出是否可能存在一些导入问题,以及在xlsx源工作簿中进行更正的位置。
当询问总和时,在“非数字”数据框中,我想扭转错误(如果数据是脏的......(非数字))
df <- data.frame(cbind(c(1:10),c(1,2,"a","b",5,6,7,8,9,10)))
colnames(df) <- c("ColA","ColB")
df <- df[order(df$ColB),]
# if the values are not numerical,
# I want to create a column with the formula to use in excel
# to do the sum in xlsx (it should start with row+1
# as there are the headers in xlsx). So I want just to put a string.
# Doing the below I always have the same unique string
# for column NonNumeric1 "=SUM(A2:B2)", it is not what I want.
df$Total.NonNumeric1 <- ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0,
paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""),
select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))
df
ColA ColB Total.NonNumeric1
1 1 1 =SUM(A2:B2)
10 10 10 =SUM(A2:B2)
5 5 2 =SUM(A2:B2)
2 2 5 =SUM(A2:B2)
9 9 6 =SUM(A2:B2)
7 7 7 =SUM(A2:B2)
8 8 8 =SUM(A2:B2)
6 6 9 =SUM(A2:B2)
3 3 a =SUM(A2:B2)
4 4 b =SUM(A2:B2)
# I tried to turn around and create the column with the string
# for each row, this worked without the ifelse.
df<- within(df,row_num <- paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""))
# as you can see row_num is sequential
df
ColA ColB Total.NonNumeric1 row_num
1 1 1 =SUM(A2:B2) =SUM(A2:B2)
10 10 10 =SUM(A2:B2) =SUM(A3:B3)
5 5 2 =SUM(A2:B2) =SUM(A4:B4)
2 2 5 =SUM(A2:B2) =SUM(A5:B5)
9 9 6 =SUM(A2:B2) =SUM(A6:B6)
7 7 7 =SUM(A2:B2) =SUM(A7:B7)
8 8 8 =SUM(A2:B2) =SUM(A8:B8)
6 6 9 =SUM(A2:B2) =SUM(A9:B9)
3 3 a =SUM(A2:B2) =SUM(A10:B10)
4 4 b =SUM(A2:B2) =SUM(A11:B11)
# but if I try to put it to replace the column Total.NonNumeric2
#(that needs to be replace
# with the string only if the data are not numerical,
# it goes back to the same "=SUM(A2:B2)"
# and does not keep the sequence or the original string.
df$Total.NonNumeric2 <- ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0,
df$row_num,
select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))
df
ColA ColB Total.NonNumeric1 row_num Total.NonNumeric2
1 1 1 =SUM(A2:B2) =SUM(A2:B2) =SUM(A2:B2)
10 10 10 =SUM(A2:B2) =SUM(A3:B3) =SUM(A2:B2)
5 5 2 =SUM(A2:B2) =SUM(A4:B4) =SUM(A2:B2)
2 2 5 =SUM(A2:B2) =SUM(A5:B5) =SUM(A2:B2)
9 9 6 =SUM(A2:B2) =SUM(A6:B6) =SUM(A2:B2)
7 7 7 =SUM(A2:B2) =SUM(A7:B7) =SUM(A2:B2)
8 8 8 =SUM(A2:B2) =SUM(A8:B8) =SUM(A2:B2)
6 6 9 =SUM(A2:B2) =SUM(A9:B9) =SUM(A2:B2)
3 3 a =SUM(A2:B2) =SUM(A10:B10) =SUM(A2:B2)
4 4 b =SUM(A2:B2) =SUM(A11:B11) =SUM(A2:B2)
我需要插入带有公式字符串的列,只有在由于字符而无法求和时才插入,否则它应该是已定义列的总和。但特别是我想了解我的错误在哪里。 谢谢
答:
1赞
PNMNS
5/6/2022
#1
正如评论所说,您从所有列开始,但即使不是,任何具有混合数字和字符串的列都会如此。在这种情况下,最好是按元素而不是按整体列执行:character
character
ifelse
# temp var y to avoid adding twice
df$Total.NonNumeric2 <- with(df, {
y <- as.numeric(ColA) + as.numeric(ColB)
# y <- as.character(y)
ifelse(is.na(y), row_num, y)
})
df["Total.NonNumeric2"]
Total.NonNumeric2
1 2
10 20
2 4
5 10
6 12
7 14
8 16
9 18
3 =SUM(A10:B10)
4 =SUM(A11:B11)
我不知道这对电子表格中的数据有多脆弱,但应该可以只用数字和字母。此外,如果所有值都是数字,则结果将是一个数字列。不确定代码的其余部分将做什么,但为了稳健性,您可以强制字符。y
评论
1赞
AnnePauline
5/6/2022
谢谢。ifelse 元素而不是整个列有很大帮助。我必须对整个数据集进行一些修改,因为我没有 2 列,而是 250 列可供选择,从 260 列中选择。我无法在完整的数据集中完成所有过程。所以我将 250 列分离为数据帧;将它们应用为数字 ;为了避免在 na 时将 0 作为总数,我创建了一个列总数。NonNumeric2 with x$Total.NonNumeric2 <- ifelse(apply(is.na(x),1,all),NA,rowSums(x,na.rm=TRUE)) ;然后将此 x$Total.NonNumeric2 检索为 y 并使用您的公式 ifelse(is.na(y), row_num, y)。
评论
lapply
list
as.numeric(lapply
sapply
character
cbind()
data.frame(c(1:10),c(1,2,"a","b",5,6,7,8,9,10))
cbind