通过匹配值将两个不同数据集中的列相乘

Multiply columns from two different datasets by matching values

提问人:Maria 提问时间:10/20/2023 更新时间:10/20/2023 访问量:40

问:

我有两个数据集,它们具有相同的列。

第一:

structure(list(geo = c("Alemanya", "Alemanya", "Espanya", "Espanya"
), time = structure(c(1688169600, 1690848000, 1009843200, 1012521600
), tzone = "UTC", class = c("POSIXct", "POSIXt")), C10 = c(95.9, 
102.6, 84.1, 82.1), C11 = c(114.1, 109.2, 89.8, 88.6), C12 = c(71.6, 
69.3, NA, NA), C13 = c(81.7, 81.6, 211.7, 207), C14 = c(90.2, 
72.1, 267.9, 284.1), C15 = c(109, 102.9, 274.6, 281.8), C16 = c(85.8, 
81.7, 216, 214.9), C17 = c(80.3, 82.1, 99, 94.3), C18 = c(57.1, 
63, 134.3, 129.3), C19 = c(86.4, 94, 81.5, 72.4), C20 = c(79.2, 
80.3, 90.5, 90.1), C21 = c(119.1, 119.1, 72.4, 71.6), C22 = c(88.9, 
88.7, 113.4, 119), C23 = c(90.8, 86.3, 229.5, 231.7), C24 = c(81.3, 
79.2, 117.1, 118.2), C25 = c(93.6, 95.7, 143, 151.6), C26 = c(120.9, 
127.2, 167, 170.8), C27 = c(103.6, 107.7, 132.4, 131.6), C28 = c(90.9, 
87.7, 111.1, 112.7), C29 = c(75.1, 70.5, 112, 114.9), C30 = c(127.3, 
128.5, 155.7, 154.7), C31 = c(66.8, 76.5, 256.8, 257.9), C32 = c(108.7, 
101.2, 112.5, 115), C33 = c(106.8, 105.9, 105.4, 88), D35 = c(63.5, 
57, 115.2, 95.5), E36 = c(NA_real_, NA_real_, NA_real_, NA_real_
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))

第二:

structure(list(geo = c("Espanya", "Alemanya"), C10 = c(0.783964803992383, 
1.5), C11 = c(0.216035196007617, 2), C12 = c(NA, 0.8), C13 = c(NA, 
NA), C14 = c(0.495717344753747, 0.03), C15 = c(0.504282655246253, 
0.636363636363636), C16 = c(0.195470986004089, 0.74896779521057
), C17 = c(0.600537503053995, 0.25103220478943), C18 = c(0.399462496946005, 
0.200188708916496), C19 = c(NA, NA), C20 = c(0.06181, 0.06181
), C21 = c(0.03649, 0.03649), C22 = c(0.04545, 0.04545), C23 = c(0.03712, 
0.495717344753747), C24 = c(0.303462321792261, 0.504282655246253
), C25 = c(0.696537678207739, 0.195470986004089), C26 = c(0.27279792746114, 
0.600537503053995), C27 = c(0.72720207253886, 0.399462496946005
), C28 = c(0.04592, 0.002), C29 = c(0.74896779521057, 0.1), C30 = c(0.25103220478943, 
0.4), C31 = c(0.200188708916496, 0.303462321792261), C32 = c(0.173297688315773, 
0.696537678207739), C33 = c(0.431042616763642, 0.27279792746114
), D35 = c(0.16484, 0.72720207253886), E36 = c(0.02858, 0.06)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -2L))

我的目标是创建一个包含新列的新数据集,例如,列 A = First$C10 * Second$C10 + First$C11 * Second$C11,列 B = First$C12 * Second$C12 + First$C13 * Second$C13....依此类推。每个方程都必须匹配 geo from 和 Second '' 数据集。First

R 匹配

评论


答:

0赞 scoa 10/20/2023 #1

您可以透视、匹配和筛选。由于它在匹配时会创建一个非常大的数据集,因此这仅适用于具有合理长度的数据帧。

不过,您必须首先处理这些 NA 值,因为它们会阻止您相乘。

df1 %>% 
  pivot_longer(matches("\\w\\d+")) %>% 
  left_join(df2 %>% 
              pivot_longer(-geo, names_to = "name2", values_to = "value2")) %>% 
  filter(name == name2) %>% 
  mutate(A = value * value2) %>% 
  group_by(geo, time) %>% 
  summarise(A = sum(A, na.rm = TRUE))

结果:

地理 时间 一个
阿莱曼亚 2023-07-01 1058.735
阿莱曼亚 2023-08-01 1046.644
西班牙人 2002-01-01 1074.218
西班牙人 2002-02-01 1077.979

评论

0赞 Maria 10/25/2023
这并不完全是我需要的,因为在这种情况下,您正在将所有值 value2(无论列名 2 如何)相加,而不是将值 value2 相乘name2
0赞 scoa 10/25/2023
我只在 name == name2 时乘法(所以 C11 从 df11 到 C11 到 df2,而不是 C11 从 df1 到 C12 从 df2
0赞 PGSA 10/20/2023 #2

我决定以一种特殊的方式做到这一点,因为我正在探索诸如此类的用法:pmap

library(dplyr)

首先,使用“geo”作为 ID 联接数据集

df3 <- left_join(df1, df2, by = c("geo" = "geo"))

然后,进行求和: 从帖子中假设这是成对乘法和加法 仅使用以 C 开头的列 我们需要删除 NA - 我将它们视为 0,但这会产生影响,因此请考虑您希望如何执行此操作

df3[is.na(df3)] <- 0

pwalk(.l = list(newcol = LETTERS[1:12],
               col1 = seq(from = 3, to = 25,by = 2),
               col2 = seq(from = 4, to = 26,by = 2),
               col3 = seq(from = 29, to = 51,by = 2),
               col4 = seq(from = 30, to = 52,by = 2)),
     .f = \(newcol, col1, col2, col3, col4) {
       df3[newcol] <<- df3[col1] * df3[col3] + df3[col2] * df3[col4]
     })

给:

>   df3[55:ncol(df3)]     
# A tibble: 4 × 12
      A     B     C     D     E     F     G     H     I     J     K     L
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 372.   57.3  72.1  84.4  11.4  9.24  49.1  59.3  114.  7.69  71.2 105. 
2 372.   55.4  67.6  81.8  12.6  9.31  46.8  58.6  119.  7.23  74.6  99.4
3  85.3   0   271.  102.   53.6  8.24  13.7 135.   142. 89.0   90.5  64.9
4  83.5   0   283.   98.6  51.7  8.18  14.0 141.   142. 91.2   90.5  57.9

请注意,除其他外,这使用运算符,我相当确定对于我这种经验的人来说,这总是一个糟糕的选择。<<-