匹配 data.table 和 matrix 并计算新列-解网

问：

我有两个data.tables“通勤者”和“距离”。距离是一个巨大的距离矩阵，具有行和列索引。通勤者有一个列“家”和“目的地”。

通勤者（1646044排）：

家	目的地
2	2
1	2
3	3
1	4
1	4

距离（1187 行）：

	1	2	3	4
1	238.23	453	263	1
2	21.2	1	238.23	238.23
3	577.98	238.23	4362	2443.22
4	234.12	987.98	89.93	12.21

我想为 df 通勤者添加一个新列“距离”。因此，如果“home”中的值与“distance”的行索引匹配，并且“destination”与“distance”的列索引匹配，我想将矩阵中的相应值添加到“commuters”中的这个新列“distance”中。问题在于，“家”有时多次使用相同的值（这就是为什么通勤者的行数多于距离的原因）。R 一直给我错误，即“cpmmuters”中的行与“distance”中的行不匹配。但是，我想保留重复项（如下表所示：“1 |4“ 出现两次）。

期望输出：

家	目的地	距离
2	2	1
1	2	453
3	3	4362
1	4	1
1	4	1

我尝试过什么：

commuters$distance <- distance[cbind(commuters$home, commuters$destination)]

我收到此错误： i 是无效类型（矩阵）。也许将来 2 列矩阵可以返回 DT 元素列表（本着常见问题解答 2.14 中 A[B] 的精神）。如果您愿意，请向 data.table 问题跟踪器报告，或将您的评论添加到 FR #657。

我该如何解决这个问题？

索引 data.table 匹配

错误是因为不是您声称的矩阵，而是 .把它变成一个框架或一个矩阵，这个问题就消失了。郑重声明，在这种情况下（以及许多其他情况），如果您使用以下方法提供示例数据，情况会更清楚：的使用是明确的，包括类和属性，而给定 Stack 接口和您的声明，问题不可重现。只有当我玩其他可能的类时，我才发现了您声称的错误。distancedata.tabledput(head(distance))dputmatrix

答：

1赞 Dean MacGregor 10/12/2023 #1

我会把它作为融化和合并来做。

设置

dist<-data.table(home = 0:4, 
                 `1` = c(1, 238.23, 21.2, 577.98, 234.12), 
                 `2` = c(2, 453, 1, 238.23, 987.98), 
                 `3` = c(3, 263, 238.23,4362, 89.93), 
                 `4` = c(4, 1, 238.23, 2443.22, 12.21))

comm <- data.table(home=c(2,1,3,1,1), destination=c(2,2,3,4,4))

最后

merge(
  comm, 
  melt(dist, id.vars='home',
       value.name = 'distance', 
       variable.factor = FALSE, 
       variable.name='destination')[, 
          destination:=as.numeric(destination)], 
  by=c('destination', 'home')
)

   destination home distance
1:           2    1      453
2:           2    2        1
3:           3    3     4362
4:           4    1        1
5:           4    1        1

0赞 r2evans 10/12/2023 #2

无需做任何花哨的事情（无需熔化）：矩阵的基本访问器允许使用 2 列矩阵来指定行/列索引。然而，你的错误是因为你的“矩阵”（正如你所说的那样）实际上是一个.如果它是一个框架或一个矩阵，它可以工作：[data.table

as.data.table(distance)[cbind(commuters$home, commuters$destination)]
# Error in `[.data.table`(as.data.table(distance), cbind(commuters$home,  : 
#   i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.
as.data.frame(distance)[cbind(commuters$home, commuters$destination)]
# [1]    1  453 4362    1    1
as.matrix(distance)[cbind(commuters$home, commuters$destination)]
# [1]    1  453 4362    1    1

所以如果真的是，我们可以做到commutersdata.table

commuters[, dist := as.matrix(distance)[cbind(home, destination)] ]
#     home destination  dist
#    <int>       <int> <num>
# 1:     2           2     1
# 2:     1           2   453
# 3:     3           3  4362
# 4:     1           4     1
# 5:     1           4     1

数据

commuters <- data.table::as.data.table(structure(list(home = c(2L, 1L, 3L, 1L, 1L), destination = c(2L, 2L, 3L, 4L, 4L)), row.names = c(NA, -5L), class = c("data.table", "data.frame")))
# the matrix version of distance ...
# to reproduce the error, convert this to a data.frame or data.table
distance <- structure(c(238.23, 21.2, 577.98, 234.12, 453, 1, 238.23, 987.98, 263, 238.23, 4362, 89.93, 1, 238.23, 2443.22, 12.21), dim = c(4L, 4L), dimnames = list(NULL, c("1", "2", "3", "4")))

检查中的值，因为 R 不适用于索引。例如，对于我拥有的数据，do ，然后生成长度为 4 而不是长度为 5 的向量。也许您的距离索引是基于 0 而不是从 1 开始的？0commuters0commuters[1,home:=0L]as.matrix(distance)[cbind(commuters$home, commuters$destination)]

0赞 r2evans 10/12/2023

另外，需要明确的是，鉴于具有整数列，它是基于位置的行/列索引，而不是基于可能具有的任何行/列名称。如果您需要改用维度的名称（其中可能您有一个名为），那么您需要在将它们转换为字符串之前将整数转换为字符串。commutersdistancedistance0cbind

0赞 Ann 10/12/2023

事实上，我在“主页”和“目的地”列中确实有 0。但是，如果我想更改它们以进行索引匹配，则需要将这些列中的每个值更改 +1。你能告诉我我是怎么做到的吗？非常感谢！

0赞 r2evans 10/12/2023

我不确定你在问什么。你是说还是？或者您的意思是仅针对包含？as.matrix(distance)[cbind(1+commuters$home, 1+commuters$destination)]as.matrix(distance)[1+cbind(commuters$home, commuters$destination)]0

上一个：在时间范围内匹配索引匹配公式

下一个：MySQL MATCH in column select 或 in where 子句。有区别吗？

匹配 data.table 和 matrix 并计算新列

Match data.table and matrix and compute new column

评论

评论