使用 ifelse 复制 R 中的列时 R 不一致

R incongruity when copying a column in R with ifelse

提问人:AnnePauline 提问时间:5/6/2022 最后编辑:AnnePauline 更新时间:9/15/2022 访问量:41

问:

在加载多个工作簿的大量 xlsx 工作表后,我想对数据源的整洁度和清洁度进行双重检查。我创建了一个数据框,将工作表读取为.numeric(具有强制引入NA的风险)和另一个数据框,按原样读取数据(具有没有数字值的风险)。这是为了面对它们,并找出是否可能存在一些导入问题,以及在xlsx源工作簿中进行更正的位置。

当询问总和时,在“非数字”数据框中,我想扭转错误(如果数据是脏的......(非数字))

df <- data.frame(cbind(c(1:10),c(1,2,"a","b",5,6,7,8,9,10)))
colnames(df) <- c("ColA","ColB")
df <- df[order(df$ColB),]

# if the values are not numerical,  
# I want to create a column with the formula to use in excel 
# to do the sum in xlsx (it should start with row+1 
# as there are the headers in xlsx). So I want just to put a string.

# Doing the below I always have the same unique string 
# for column NonNumeric1  "=SUM(A2:B2)", it is not what I want.

df$Total.NonNumeric1 <-   ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0, 
                                 paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""), 
                                 select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))

df
   ColA ColB Total.NonNumeric1
1     1    1       =SUM(A2:B2)
10   10   10       =SUM(A2:B2)
5     5    2       =SUM(A2:B2)
2     2    5       =SUM(A2:B2)
9     9    6       =SUM(A2:B2)
7     7    7       =SUM(A2:B2)
8     8    8       =SUM(A2:B2)
6     6    9       =SUM(A2:B2)
3     3    a       =SUM(A2:B2)
4     4    b       =SUM(A2:B2)


# I tried to turn around and create the column with the string 
# for each row, this worked without the ifelse.

df<- within(df,row_num <- paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""))

# as you can see row_num is sequential
df
   ColA ColB Total.NonNumeric1       row_num
1     1    1       =SUM(A2:B2)   =SUM(A2:B2)
10   10   10       =SUM(A2:B2)   =SUM(A3:B3)
5     5    2       =SUM(A2:B2)   =SUM(A4:B4)
2     2    5       =SUM(A2:B2)   =SUM(A5:B5)
9     9    6       =SUM(A2:B2)   =SUM(A6:B6)
7     7    7       =SUM(A2:B2)   =SUM(A7:B7)
8     8    8       =SUM(A2:B2)   =SUM(A8:B8)
6     6    9       =SUM(A2:B2)   =SUM(A9:B9)
3     3    a       =SUM(A2:B2) =SUM(A10:B10)
4     4    b       =SUM(A2:B2) =SUM(A11:B11)

# but if I try to put it to replace the column Total.NonNumeric2 
#(that needs to be replace 
# with the string only if the data are not numerical, 
# it goes back to the same "=SUM(A2:B2)" 
# and does not keep the sequence or the original string. 

df$Total.NonNumeric2 <-   ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0,
                          df$row_num, 
                          select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))

df
   ColA ColB Total.NonNumeric1       row_num Total.NonNumeric2
1     1    1       =SUM(A2:B2)   =SUM(A2:B2)       =SUM(A2:B2)
10   10   10       =SUM(A2:B2)   =SUM(A3:B3)       =SUM(A2:B2)
5     5    2       =SUM(A2:B2)   =SUM(A4:B4)       =SUM(A2:B2)
2     2    5       =SUM(A2:B2)   =SUM(A5:B5)       =SUM(A2:B2)
9     9    6       =SUM(A2:B2)   =SUM(A6:B6)       =SUM(A2:B2)
7     7    7       =SUM(A2:B2)   =SUM(A7:B7)       =SUM(A2:B2)
8     8    8       =SUM(A2:B2)   =SUM(A8:B8)       =SUM(A2:B2)
6     6    9       =SUM(A2:B2)   =SUM(A9:B9)       =SUM(A2:B2)
3     3    a       =SUM(A2:B2) =SUM(A10:B10)       =SUM(A2:B2)
4     4    b       =SUM(A2:B2) =SUM(A11:B11)       =SUM(A2:B2)

我需要插入带有公式字符串的列,只有在由于字符而无法求和时才插入,否则它应该是已定义列的总和。但特别是我想了解我的错误在哪里。 谢谢

r 字符串 数据帧 if-statement seq

评论

0赞 Gregor Thomas 5/6/2022
我没有运行你的代码,但返回了一个 ,所以看起来很可疑。也许可以尝试一下?lapplylistas.numeric(lapplysapply
0赞 Gregor Thomas 5/6/2022
而且,您正在测试列是否是类,但是使用任何列是否为字符类来创建示例数据的方式,它们都将是类。也许尝试获取示例数据,这样就不会先将所有内容强制到矩阵中。charactercbind()data.frame(c(1:10),c(1,2,"a","b",5,6,7,8,9,10))cbind

答:

1赞 PNMNS 5/6/2022 #1

正如评论所说,您从所有列开始,但即使不是,任何具有混合数字和字符串的列都会如此。在这种情况下,最好是按元素而不是按整体列执行:charactercharacterifelse

# temp var y to avoid adding twice
df$Total.NonNumeric2 <- with(df, {
  y <- as.numeric(ColA) + as.numeric(ColB)
  # y <- as.character(y)

  ifelse(is.na(y), row_num, y)
})
df["Total.NonNumeric2"]
   Total.NonNumeric2
1                  2
10                20
2                  4
5                 10
6                 12
7                 14
8                 16
9                 18
3      =SUM(A10:B10)
4      =SUM(A11:B11)

我不知道这对电子表格中的数据有多脆弱,但应该可以只用数字和字母。此外,如果所有值都是数字,则结果将是一个数字列。不确定代码的其余部分将做什么,但为了稳健性,您可以强制字符。y

评论

1赞 AnnePauline 5/6/2022
谢谢。ifelse 元素而不是整个列有很大帮助。我必须对整个数据集进行一些修改,因为我没有 2 列,而是 250 列可供选择,从 260 列中选择。我无法在完整的数据集中完成所有过程。所以我将 250 列分离为数据帧;将它们应用为数字 ;为了避免在 na 时将 0 作为总数,我创建了一个列总数。NonNumeric2 with x$Total.NonNumeric2 <- ifelse(apply(is.na(x),1,all),NA,rowSums(x,na.rm=TRUE)) ;然后将此 x$Total.NonNumeric2 检索为 y 并使用您的公式 ifelse(is.na(y), row_num, y)。