提问人:Maria 提问时间:10/20/2023 更新时间:10/20/2023 访问量:40
通过匹配值将两个不同数据集中的列相乘
Multiply columns from two different datasets by matching values
问:
我有两个数据集,它们具有相同的列。
第一:
structure(list(geo = c("Alemanya", "Alemanya", "Espanya", "Espanya"
), time = structure(c(1688169600, 1690848000, 1009843200, 1012521600
), tzone = "UTC", class = c("POSIXct", "POSIXt")), C10 = c(95.9,
102.6, 84.1, 82.1), C11 = c(114.1, 109.2, 89.8, 88.6), C12 = c(71.6,
69.3, NA, NA), C13 = c(81.7, 81.6, 211.7, 207), C14 = c(90.2,
72.1, 267.9, 284.1), C15 = c(109, 102.9, 274.6, 281.8), C16 = c(85.8,
81.7, 216, 214.9), C17 = c(80.3, 82.1, 99, 94.3), C18 = c(57.1,
63, 134.3, 129.3), C19 = c(86.4, 94, 81.5, 72.4), C20 = c(79.2,
80.3, 90.5, 90.1), C21 = c(119.1, 119.1, 72.4, 71.6), C22 = c(88.9,
88.7, 113.4, 119), C23 = c(90.8, 86.3, 229.5, 231.7), C24 = c(81.3,
79.2, 117.1, 118.2), C25 = c(93.6, 95.7, 143, 151.6), C26 = c(120.9,
127.2, 167, 170.8), C27 = c(103.6, 107.7, 132.4, 131.6), C28 = c(90.9,
87.7, 111.1, 112.7), C29 = c(75.1, 70.5, 112, 114.9), C30 = c(127.3,
128.5, 155.7, 154.7), C31 = c(66.8, 76.5, 256.8, 257.9), C32 = c(108.7,
101.2, 112.5, 115), C33 = c(106.8, 105.9, 105.4, 88), D35 = c(63.5,
57, 115.2, 95.5), E36 = c(NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
第二:
structure(list(geo = c("Espanya", "Alemanya"), C10 = c(0.783964803992383,
1.5), C11 = c(0.216035196007617, 2), C12 = c(NA, 0.8), C13 = c(NA,
NA), C14 = c(0.495717344753747, 0.03), C15 = c(0.504282655246253,
0.636363636363636), C16 = c(0.195470986004089, 0.74896779521057
), C17 = c(0.600537503053995, 0.25103220478943), C18 = c(0.399462496946005,
0.200188708916496), C19 = c(NA, NA), C20 = c(0.06181, 0.06181
), C21 = c(0.03649, 0.03649), C22 = c(0.04545, 0.04545), C23 = c(0.03712,
0.495717344753747), C24 = c(0.303462321792261, 0.504282655246253
), C25 = c(0.696537678207739, 0.195470986004089), C26 = c(0.27279792746114,
0.600537503053995), C27 = c(0.72720207253886, 0.399462496946005
), C28 = c(0.04592, 0.002), C29 = c(0.74896779521057, 0.1), C30 = c(0.25103220478943,
0.4), C31 = c(0.200188708916496, 0.303462321792261), C32 = c(0.173297688315773,
0.696537678207739), C33 = c(0.431042616763642, 0.27279792746114
), D35 = c(0.16484, 0.72720207253886), E36 = c(0.02858, 0.06)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L))
我的目标是创建一个包含新列的新数据集,例如,列 A = First$C10 * Second$C10 + First$C11 * Second$C11,列 B = First$C12 * Second$C12 + First$C13 * Second$C13....依此类推。每个方程都必须匹配 geo from 和 Second '' 数据集。First
答:
0赞
scoa
10/20/2023
#1
您可以透视、匹配和筛选。由于它在匹配时会创建一个非常大的数据集,因此这仅适用于具有合理长度的数据帧。
不过,您必须首先处理这些 NA 值,因为它们会阻止您相乘。
df1 %>%
pivot_longer(matches("\\w\\d+")) %>%
left_join(df2 %>%
pivot_longer(-geo, names_to = "name2", values_to = "value2")) %>%
filter(name == name2) %>%
mutate(A = value * value2) %>%
group_by(geo, time) %>%
summarise(A = sum(A, na.rm = TRUE))
结果:
地理 | 时间 | 一个 |
---|---|---|
阿莱曼亚 | 2023-07-01 | 1058.735 |
阿莱曼亚 | 2023-08-01 | 1046.644 |
西班牙人 | 2002-01-01 | 1074.218 |
西班牙人 | 2002-02-01 | 1077.979 |
评论
0赞
Maria
10/25/2023
这并不完全是我需要的,因为在这种情况下,您正在将所有值 value2(无论列名 2
如何)相加,而不是将值 value2 相乘name2
0赞
scoa
10/25/2023
我只在 name == name2 时乘法(所以 C11 从 df11 到 C11 到 df2,而不是 C11 从 df1 到 C12 从 df2
0赞
PGSA
10/20/2023
#2
我决定以一种特殊的方式做到这一点,因为我正在探索诸如此类的用法:pmap
library(dplyr)
首先,使用“geo”作为 ID 联接数据集
df3 <- left_join(df1, df2, by = c("geo" = "geo"))
然后,进行求和: 从帖子中假设这是成对乘法和加法 仅使用以 C 开头的列 我们需要删除 NA - 我将它们视为 0,但这会产生影响,因此请考虑您希望如何执行此操作
df3[is.na(df3)] <- 0
pwalk(.l = list(newcol = LETTERS[1:12],
col1 = seq(from = 3, to = 25,by = 2),
col2 = seq(from = 4, to = 26,by = 2),
col3 = seq(from = 29, to = 51,by = 2),
col4 = seq(from = 30, to = 52,by = 2)),
.f = \(newcol, col1, col2, col3, col4) {
df3[newcol] <<- df3[col1] * df3[col3] + df3[col2] * df3[col4]
})
给:
> df3[55:ncol(df3)]
# A tibble: 4 × 12
A B C D E F G H I J K L
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 372. 57.3 72.1 84.4 11.4 9.24 49.1 59.3 114. 7.69 71.2 105.
2 372. 55.4 67.6 81.8 12.6 9.31 46.8 58.6 119. 7.23 74.6 99.4
3 85.3 0 271. 102. 53.6 8.24 13.7 135. 142. 89.0 90.5 64.9
4 83.5 0 283. 98.6 51.7 8.18 14.0 141. 142. 91.2 90.5 57.9
请注意,除其他外,这使用运算符,我相当确定对于我这种经验的人来说,这总是一个糟糕的选择。<<-
评论