计算重复的多列中值的出现次数

Count occurrences of value in multiple columns with duplicates

提问人:KArrow'sBest 提问时间:3/26/2022 最后编辑:KArrow'sBest 更新时间:3/26/2022 访问量:426

问:

我的问题与以下内容非常相似: R:计算多列中值的出现次数

但是,那里提出的解决方案对我不起作用,因为在同一行中,该值可能会出现两次,但我只想计算出现该值的行。我已经制定了一个解决方案,但似乎太长了:

> toy_data = data.table(from=c("A","A","A","C","E","E"), to=c("B","C","A","D","F","E"))
> toy_data
   from to
1:    A  B
2:    A  C
3:    A  A
4:    C  D
5:    E  F
6:    E  E
> #get a table with intra-link count
> A = data.table(table(unlist(toy_data[from==to,from ])))
> A
   V1 N
1:  A 1
2:  E 1
A #get a table with total count
> B = data.table(table(unlist(toy_data[,c(from,to)])))
> B
   V1 N
1:  A 4
2:  B 1
3:  C 2
4:  D 1
5:  E 3
6:  F 1
> 
> # concatenate changing sign
> table = rbind(B,A[,.(V1,-N)],use.names=FALSE)
> # groupby and subtract
> table[,sum(N),by=V1]
   V1 V1
1:  A  3
2:  B  1
3:  C  2
4:  D  1
5:  E  2
6:  F  1

有没有一些功能可以在更少的行中完成这项工作?我以为在 python 中我会连接 from 和 to 然后 match(),但找不到正确的 sintax

编辑:我知道这会起作用,但我想避免各种之间的循环(我不知道如何以这种方式格式化输出)A=length(toy_data[from=="A"|to=="A",from])"A","B"...

r data.table 数据操作

评论


答:

2赞 ThomasIsCoding 3/26/2022 #1

您可以尝试下面的代码

> toy_data[, to := replace(to, from == to, NA)][, data.frame(table(unlist(.SD)))]
  Var1 Freq
1    A    3
2    B    1
3    C    2
4    D    1
5    E    2
6    F    1

toy_data %>%
    mutate(to = replace(to, from == to, NA)) %>%
    unlist() %>%
    table() %>%
    as.data.frame()

这给了

  . Freq
1 A    3
2 B    1
3 C    2
4 D    1
5 E    2
6 F    1
2赞 akrun 3/26/2022 #2

data.table

library(data.table)
toy_data[from == to, to := NA][, .(to = na.omit(c(from, to)))][, .N, to]

评论

2赞 ThomasIsCoding 3/26/2022
真的很喜欢这个部分[from == to, to := NA]
2赞 langtang 3/26/2022 #3

按照 akrun 的建议使用 to:=NA,可以将结果包装并转换为 data.tabletable(unlist())

data.table(table(unlist(toy_data[from==to, to:=NA, from])))
2赞 TimTeaFan 3/26/2022 #4

您可以对向量进行子集:to

data.table(table(unlist(toy_data[,c(from,to[to!=from])])))

   V1 N
1:  A 3
2:  B 1
3:  C 2
4:  D 1
5:  E 2
6:  F 1

评论

1赞 KArrow'sBest 3/28/2022
最易读,并且不会更改源数据。