如何使用 r 将 excel 表中的数据匹配和索引到不同的 excel 表中?

How to use r to match and index data from an excel table into a different excel table?

提问人:TFT 提问时间:9/11/2023 更新时间:9/11/2023 访问量:35

问:

1

我有两个不同的 excel 电子表格(输入和输出文件)。我不想直接在excel中使用“match”和“index”函数,而是想使用r脚本在输入文件中查找矩阵中的值,并将这些值存储在输出文件中的正确单元格中。

我的输入文件由矩阵(列和行)和输出文件组成。但是,输出中的数据结构不同。简单的换位在这里是行不通的。除此之外,我的输出文件中可能还有其他国家/地区,这些国家/地区未在我的输入文件中列出,反之亦然。因此,我需要使用二维查找来查找输入文件中的值,并相应地将它们存储在我的输出文件中。

假设这是我的输入文件(用于说明的虚构数字):enter image description here

这是我的输出文件,具有固定的数据结构,我不想更改:enter image description here

这是我的r代码:

#clearing workspace
rm(list=ls())

# Load required libraries
library(openxlsx)
library(tidyverse)
library(dplyr)
pacman::p_load(tidyverse)

# get username 
username <- Sys.getenv("USER")

# Load input and output Excel files
input_file <- paste0("/Users/", username, "/Downloads/input_file.xlsx", collapse = "")
output_file <- paste0("/Users/", username, "/Desktop/output_file.xlsx", collapse = "")

# Load the input and output matrices
input_matrix <- read_xlsx(input_file, sheet = "KLICKHERE")
output_matrix <- read_xlsx(output_file, sheet = "ENTERHERE")

input_matrix |>
  mutate(quarter = as.Date(quarter)) |>
  pivot_longer(-quarter, names_to = "country", values_to = "value") |>
  mutate(country = ifelse(str_detect(country, "UK"), "UK", str_to_title(country))) |>
  pivot_wider(names_from = "quarter", values_from = "value", values_fn = ~paste0(.x, collapse = ",")) |>
  write_csv("output.csv")

但是,此代码只是在我的输入文件中转置矩阵,而不是以二维方式查找值并将它们存储在输出文件中的正确单元格中。如果输入文件中没有可用的值,则应将输出文件中的单元格留空。 我如何在 r 中做到这一点? 感谢您的支持。

> dput(input_matrix)
structure(list(quarter = structure(c(1640995200, 1648771200, 
1656633600, 1664582400, 1672531200, 1680307200, 1688169600, 1696118400, 
1704067200, 1711929600, 1719792000, 1727740800, 1735689600, 1743465600, 
1751328000, 1759276800, 1767225600, 1775001600, 1782864000, 1790812800, 
1798761600, 1806537600, 1814400000, 1822348800), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), USA = c(3.2, 1.2617029893181, 2.60440314593473, 
0.205747170894448, 2.99742239259666, 0.454981287908458, 0.812500920203167, 
3.53979030628357, 2.203045423758, 0.054471200265702, 2.92803826928382, 
0.718964340034683, 1.60951470750129, 5.07871970749977, 5.69403126006479, 
1.22925310502368, 3.66396581660635, 2.37878419177338, 2.29467033332622, 
5.03595630837856, 2.25374064291613, 1.69444882698869, 4.16205429572283, 
4.50132478373478), Switzerland = c(4, 2.38038947850657, 5.47668679859636, 
5.91361388434538, 4.77394394868853, 0.51390066344242, 5.01921886848812, 
2.50248783131655, 4.01832050488102, 5.41622706832583, 5.30149956216031, 
3.16778787833323, 2.199973116468, 5.01366343788224, 4.29923192879718, 
4.74615956273584, 1.28422990972834, 0.284477581237545, 2.08538425170424, 
0.463401565316672, 5.19591972413863, 1.48139690105528, 3.72116283773825, 
2.88215533537597), China = c(-1, 3.35979319893751, 3.41085866605616, 
0.560088392935827, 5.13880709708747, 4.12321867925324, 0.678575131657537, 
5.05445686032681, 1.91810878862458, 1.3819304062605, 0.80241487254838, 
5.88840619656107, 1.4643177661779, 1.30971606465739, 5.27065656469845, 
2.59430512488464, 2.43626303990699, 2.6781401256743, 2.92798363758913, 
3.82250194049481, 3.53273150832144, 2.88313585242345, 2.2629948322944, 
1.45945340574197)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-24L))
> 
> dput(output_matrix)
structure(list(...1 = c("Portugal", "USA", "South Africa", "Switzerland", 
"China"), `44562` = c(NA, NA, NA, NA, NA), `44927` = c(NA, NA, 
NA, NA, NA), `45017` = c(NA, NA, NA, NA, NA), `45383` = c(NA, 
NA, NA, NA, NA), `45474` = c(NA, NA, NA, NA, NA), `45658` = c(NA, 
NA, NA, NA, NA), `45748` = c(NA, NA, NA, NA, NA), `45839` = c(NA, 
NA, NA, NA, NA), `45931` = c(NA, NA, NA, NA, NA), `46388` = c(NA, 
NA, NA, NA, NA), `46478` = c(NA, NA, NA, NA, NA), `46569` = c(NA, 
NA, NA, NA, NA), `46661` = c(NA, NA, NA, NA, NA)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L))
> 
r Excel 数据帧 匹配 vlookup

评论


答:

0赞 r2evans 9/11/2023 #1

这似乎是一个直截了当的双枢轴:

library(dplyr)
library(tidyr)
input_matrix %>%
  mutate(quarter = as.Date(quarter)) %>%
  pivot_longer(cols = -quarter, names_to = "country") %>%
  pivot_wider(id_cols = country, names_from = "quarter", values_from = "value")
# # A tibble: 3 × 25
#   country    `2022-01-01` `2022-04-01` `2022-07-01` `2022-10-01` `2023-01-01` `2023-04-01` `2023-07-01` `2023-10-01`
#   <chr>             <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
# 1 USA                 3.2         1.26         2.60        0.206         3.00        0.455        0.813         3.54
# 2 Switzerla…          4           2.38         5.48        5.91          4.77        0.514        5.02          2.50
# 3 China              -1           3.36         3.41        0.560         5.14        4.12         0.679         5.05
# # ℹ 16 more variables: `2024-01-01` <dbl>, `2024-04-01` <dbl>, `2024-07-01` <dbl>, `2024-10-01` <dbl>,
# #   `2025-01-01` <dbl>, `2025-04-01` <dbl>, `2025-07-01` <dbl>, `2025-10-01` <dbl>, `2026-01-01` <dbl>,
# #   `2026-04-01` <dbl>, `2026-07-01` <dbl>, `2026-10-01` <dbl>, `2027-01-01` <dbl>, `2027-04-01` <dbl>,
# #   `2027-07-01` <dbl>, `2027-10-01` <dbl>

如果您需要将名称设置为看起来像 Excel 日期数字的名称(例如,),我们可以参考如何在 R 中将 Excel 日期格式转换为正确的日期以使用以下方法进行转换:44562"1899-12-30"

input_matrix %>%
  mutate(quarter = as.Date(quarter) - as.Date("1899-12-30")) %>%
  pivot_longer(cols = -quarter, names_to = "country") %>%
  pivot_wider(id_cols = country, names_from = "quarter", values_from = "value")
# # A tibble: 3 × 25
#   country    `44562` `44652` `44743` `44835` `44927` `45017` `45108` `45200` `45292` `45383` `45474` `45566` `45658`
#   <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
# 1 USA            3.2    1.26    2.60   0.206    3.00   0.455   0.813    3.54    2.20  0.0545   2.93    0.719    1.61
# 2 Switzerla…     4      2.38    5.48   5.91     4.77   0.514   5.02     2.50    4.02  5.42     5.30    3.17     2.20
# 3 China         -1      3.36    3.41   0.560    5.14   4.12    0.679    5.05    1.92  1.38     0.802   5.89     1.46
# # ℹ 11 more variables: `45748` <dbl>, `45839` <dbl>, `45931` <dbl>, `46023` <dbl>, `46113` <dbl>, `46204` <dbl>,
# #   `46296` <dbl>, `46388` <dbl>, `46478` <dbl>, `46569` <dbl>, `46661` <dbl>

评论

0赞 TFT 9/12/2023
感谢您的回复和代码建议。您是否碰巧知道是否有办法将值存储在 excel 输出文件中的正确单元格中,而不仅仅是转置输入矩阵?
0赞 r2evans 9/12/2023
我想我不明白。你是简单的换位,我以为这就是你想要的。这是以编程方式完成的,尽管也许并且一些数据修复可以做同样的事情(这里有几个假设)。output_matrixt(..)
0赞 TFT 9/14/2023
我的output_matrix恰好是我input_matrix的简单换位,但一般来说,情况并非如此。这就是为什么我需要双向查找。