如何根据变量的某个值/类别查看另一个变量（大型数据集）-解网

问：

我正在根据我在 R Studio 中遇到的另一个问题来写这个问题。我有一个非常大的数据集，其中包含鸟类的运动数据（ACC），并且每个个体都有多行（每行代表一个时间戳）。在我的数据集中，我需要查看我在某个地区有多少人。这里的问题是，我为每个人准备了许多行，并且使用简单的函数（如表或摘要）返回分配给该区域的行数。我想知道的是使用一个简单的功能来了解属于该领土的个人。

这是我到目前为止所做的：

我的数据帧中有很多行，但只有大约 50 个人（每个行有多行）。
我总共有大约 15 个区域，每行都有一个区域 ID（重复）。

我试过使用表

table(df$territory_id) %>% sort(decreasing = TRUE) %>% head

这给了我输出：

ter1  ter2  ter3  ter4  ter5  ter6 
275034 207746 232739 165260 162103 259644

在这里，我有具有区域 ID 的行数。因为我想知道一个地区有多少不同的人，所以我将这些地区子集在单独的对象中，并为此做了表格：

t <- filter(df, territory == "ter1")

然后：

table(t$individualID)

这给了我想要的输出。但是，我需要对每个地区重复该过程。

我想知道是否有更简单的方法可以做到这一点？我只有 15 个领土，但如果我有更多，那将需要很多时间来重复这个功能。有没有更简单的方法可以做到这一点？

R DPLYR 计数子集摘要

> head(df)
  bird_id territory_id           timestamp
1       1            I 2023-03-05 03:57:14
2       1            D 2023-01-01 21:06:37
3       1            G 2023-03-01 07:23:02
4       1            A 2023-02-23 01:09:48
5       1            B 2023-03-29 22:41:45
6       1            G 2023-01-29 03:29:01

因此，虽然我很清楚你想分析你的数据集，但我不确定你具体想做什么。因此，这里有一些你可能想要的东西，以及如何去做。

# 1. get the number of birds you have seen at any point in each territory
df |>
  distinct(territory_id, bird_id) |>
  count(territory_id)

# 2. count the number of rows in your dataset for each territory
count(df, territory_id)

# 3. count the number of rows in your dataset for each territory and bird

count(df, territory_id, bird_id)

1赞 Skyk 10/25/2023 #2

是的！这就是我想知道的！非常感谢！基本上，我已经查看了您提供的第一个代码：

df |>
distinct(territory_id, bird_id) |>
count(territory_id)

它返回的内容如下：

  territory_id     n
  <chr>        <int>
  1 GR002            2
  2 GR009            1
  3 GR011            1

等。。。

但在这里我想知道属于该领土的 individualID（也许我放了）：

df |>
distinct(territory_id, bird_id) |>
count(territory_id, bird_id)

它返回了我：

  <chr>        <chr>                    <int>
  1 GR002        individual1 (eobs 5860)          1
  2 GR002        individual2 (eobs 5861)          1
  3 GR009        individual3 (eobs 6483)          1

这给了我我想要的。所以我只需要使用计数功能......谢谢！

如何根据变量的某个值/类别查看另一个变量（大型数据集）

How to look at a certain value/category of a variable according to another one (Large dataset)

评论

评论