使用 R 比较不同文件中的两列-解网

问：

我需要帮助在两列之间匹配两个文档，以便它在文档 1 中选取一对，并循环访问文档 2 查找匹配项并输出差异最小的匹配项（最佳匹配项）。例如，查看下面的数据，我希望输出在文档 2 中，其中显示文档 2 中的匹配值文件1 |A列 |B色谱柱 |
|-------- |-------- | |0,12 |23.45 | |0.13 |25.65 |

文件2 |A列 |B色谱柱 | |-------- |-------- | |0,12 |23.45 | |0,12 |23.46 |

输出

A列	B列	从 doc2 匹配
单元格 1	单元格 2	0.12
单元格 3	单元格 4

为此，我开发了以下脚本。

r1
r2
matched_hold = list()
# Loop through rows in r1
for (row in r1$rowid) {
 zval = r1$RTB[r1$rowid == row]
 mval = r1$MZB[r1$rowid == row]
 # Check if zval and mval are numeric
 if (is.numeric(zval) && is.numeric(mval)) {
  r2_copy = r2 %>%
   filter((RTA < zval + 0.05 & RTA > zval - 0.05) &
        (MZA < mval + 0.01 & MZA > mval - 0.01))
  r2_copy$RTB = zval
  r2_copy$MZB = mval
  matched_hold[[row]] = r2_copy
 }
}
# Combine the matched data frames
matched_df = do.call('rbind', matched_hold)

但是，该脚本的错误在于它没有带来最佳匹配，例如，如果有两个匹配项，一个差值为 0.5，另一个差值为 0.1，而不是选择 0.1，它会选择 0.5 匹配项。你能帮我修改一下，以便结果返回提供的最佳匹配基础和提供的标准。

R 匹配

# You don't need to use an ID column for the loop, can be risky
matched_hold = list()
for (row_df1 in 1:nrow(df1)){
  
  # Let's select the row values
  rta_val = df1$RTA[row_df1]
  mza_val = df1$MZA[row_df1]
  
  # As you wanted, we only consider cases when both values are numeric (and not NA values!)
  if (is.numeric(rta_val) & is.numeric(mza_val) & !is.na(rta_val+mza_val)){
    
    # Instead of a nested loop we use a vectorial composition to get all differences values
    diff_per_row = abs(df2$RTB - rta_val) + abs(df2$MZB - mza_val) # Differences should be absolute!
    
    # Now, what is the cell most similar to this cell? You can find out
    row_df2 = which.min(diff_per_row)
    
    # Introduce that cell as the list value
    matched_hold[[row_df1]] = row_df2
     
  }
}

上一个：如何在 R 中进行部分匹配

下一个：我怎样才能将 2 个文件的行与 Pandas 的特定规则相匹配

使用 R 比较不同文件中的两列

Comparing two columns from different files using R

评论