删除基于 3-4 列的重复项（dplyr）-解网

问：

我意识到以前可能有人问过这个问题，但我正在努力正确删除我的 df 中的重复项。我使用了此处推荐的方法，但它并没有删除所有重复项。

#install 套餐

#Loading packages
library(tidyverse)
library(readxl)
library(writexl)
library(stringr)
library(textclean)
library(lubridate)

这是我的数据：

dput(df[1:10,c(1,2,3,4,5,6,7)])

数据输出：

structure(list(username = c("Engineeer", "ftpofmpo", "sagood",
"ishtarsg", "Ohayo!", "Engineeer"), post = c("Engineers are si ginnas who recently graduated from Universities. No one stays as an Engineer like forever.\nEngineering is harder than Business but more fulfilling in the long run.\nEngineer > Manager > Director > Chief Technology Officer > Chief Executive Officer\n\tzero to sixty times",
"\n\t\n\t\t\n\t\t\t\n\t\t\t\tEngineeer said:\n\t\t\t\n\t\t\n\t\n\t\n\t\t\n\t\t\n\t\t\tThen pick up Engineering. Its harder but more fulfilling in the long run. No one stays as an Engineer like forever.\nEngineer > Manager > Director > Chief Technical Officer > Chief Executive\n\t\t\n\t\tClick to expand...\n\t\n\nhave you seen the past list of president scholars?\nif minister salary pegg to engineer pay jialat liao... check out lky statement on y salary must be high",
"i thought engineering ish dominated by ceca?????", "Always opt to be a priest.",
"after CEO beome mayor then minister?", "\n\t\n\t\t\n\t\t\t\n\t\t\t\tsagood said:\n\t\t\t\n\t\t\n\t\n\t\n\t\t\n\t\t\n\t\t\ti thought engineering ish dominated by ceca?????\n\t\t\n\t\tClick to expand...\n\t\nIf you fret Engineering its fine. Donate these good paying jobs to CECAs."
), date = structure(c(1622851200, 1622851200, 1622851200, 1622851200,
1622851200, 1622851200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), user_status = c("Supremacy Member", "Banned", "Member",
"Arch-Supremacy Member", "Great Supremacy Member", "Supremacy Member"
), treatment_implementation = c(0, 0, 0, 0, 0, 0), month_year = c(2021.41666666667,
2021.41666666667, 2021.41666666667, 2021.41666666667, 2021.41666666667,
2021.41666666667), id = c(255, 296, 747, 389, 634, 255)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

要删除重复的行，我将根据以下三列进行删除：

# Drop duplicate observations
df <-
df %>%
filter(duplicated(cbind(username, post, date)))

运行上述代码后，当我手动检查数据时，我仍然看到重复的行。此外，当我在第一次重复删除尝试后再次运行上面的相同代码时，它会不断删除更多行，这令人困惑，因为我认为应该在一次尝试中删除所有重复的行（即只运行一次代码时）。

R dplyr tidyr 润滑剂纵梁

删除基于 3-4 列的重复项（dplyr）

Removing duplicates based on 3-4 columns (dplyr)

评论

删除基于 3-4 列的重复项 （dplyr）

Removing duplicates based on 3-4 columns (dplyr)

评论

删除基于 3-4 列的重复项（dplyr）