提问人:Payam Rahmani 提问时间:11/27/2022 最后编辑:TarJaePayam Rahmani 更新时间:11/29/2022 访问量:154
R:如何检查一列中的字符在另一列中是否具有相同的值
R: How I can check that my characters in one column have same value in another column
问:
我是 R 的初学者,我需要学习如何执行代码。正如您在我的数据框中看到的,我想检查商品列中的鸡蛋是否在所有行中都具有相同的单位。
数据框:
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
commodity unit
1 eggs 1.8 kg
2 lentils (green) 900 g
3 oil (vegetable) 810 g
4 rice kg
5 sugar (white) kg
6 eggs 1.8 kg
7 lentils (green) 900 g
8 oil (vegetable) 810 g
9 rice kg
10 sugar (white) kg
11 eggs 1.8 kg
我不知道我该怎么办
答:
1赞
TarJae
11/27/2022
#1
一种方法可能是:
首先创建一个列,其中您的单位仅提取字母,然后使用:distinct()
library(dplyr)
df %>%
mutate(unit1 = gsub("[^a-zA-Z]", "", unit)) %>%
distinct(unit1)
unit1
1 kg
2 g
df <- structure(list(commodity = c("eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs", "lentils (green)", "oil (vegetable)",
"rice", "sugar (white)", "eggs"), unit = c("1.8 kg", "900 g",
"810 g", "kg", "kg", "1.8 kg", "900 g", "810 g", "kg", "kg",
"1.8 kg")), class = "data.frame", row.names = c(NA, -11L))
评论
0赞
Payam Rahmani
11/28/2022
谢谢你的回答,但我想知道,例如,鸡蛋是否在所有行中都有相同的单位。以及其他项目。
1赞
akrun
11/27/2022
#2
在 中,我们可以使用base R
length(unique(trimws(df$unit, whitespace = "[0-9.]+\\s+"))) == 1
[1] FALSE
如果要检查元素的子集
with(df, length(unique(trimws(unit[grepl("eggs", commodity)],
whitespace = "[0-9.]+\\s+"))) == 1)
[1] TRUE
如果我们想检查所有元素
library(dplyr)
library(stringr)
df %>%
group_by(item = str_extract(commodity, "^\\w+(?=\\s*)")) %>%
summarise(isUnitSame = n_distinct(str_extract(unit, "[a-z]+$"))==1)
-输出
# A tibble: 5 × 2
item isUnitSame
<chr> <lgl>
1 eggs TRUE
2 lentils TRUE
3 oil TRUE
4 rice TRUE
5 sugar TRUE
评论
0赞
Payam Rahmani
11/28/2022
谢谢你的回答,但我想知道,例如,鸡蛋是否在所有行中都有相同的单位。以及其他项目。
评论