在 R 中将行程移动数据帧转换为居住计数摘要数据帧

Transform trip movement data frame into residence counts summary data frame in R

提问人:November2Juliet 提问时间:10/31/2023 更新时间:10/31/2023 访问量:57

问:

我试图找出在给定时间湖中有多少只苍鹭,以及这个总数如何随时间变化。不幸的是,我拥有的数据集不是按湖泊甚至苍鹭标签 ID 组织的,而是按迁徙行程编号组织的。我想知道在给定的时间范围内,每个湖泊有多少只鸟。如何整理起始数据集以到达结束数据集?我试图将时间序列数据帧作为中间处理步骤,但我在其他 Stack Overflow 帖子中找到的时间序列数据帧并不完全是我想要的。以下是示例数据帧和我的尝试。

# Start data frame
#  Great blue heron (Ardea herodias) flights migration data
bird_flights = data.frame(species = c("Ardea herodias", "Ardea herodias", "Ardea herodias", "Ardea herodias", "Ardea herodias", "Ardea herodias", "Ardea herodias","Ardea herodias", "Ardea herodias"),
                          subspecies = c("herodias", "herodias", "wardi", "herodias", "herodias", "herodias", "herodias", "herodias", "herodias"),
                          trip_number = c(1:9),
                          tag_id = c(4001, 4042, 5959, 4001, 4001, 4001, 4042, 4042, 4042),
                          departure_date   = c("10/15/2021", "10/10/2021", "10/2/2021", "11/20/2021", "3/1/2022", "3/20/2022", "11/20/2021", "3/3/2022", "3/20/2022"),
                          departure_lake = c("Seneca", "Seneca", "Otter Bay", "Marion", "Bee Haven Bay", "Marion", "Marion", "Bee Haven Bay", "Marion"),
                          arrival_date = c("10/30/2021", "10/29/2021", "10/12/2021", "12/5/2021", "3/15/2022", "4/5/2022", "12/5/2021", "3/17/2022", "4/6/2022"),
                          arrival_lake = c("Marion", "Marion", "Bee Haven Bay", "Bee Haven Bay", "Marion", "Seneca", "Bee Haven Bay", "Marion", "Cayuga"))

# Goal: End data frame
# Lake Populations and Residence Times for The Migratory Year (October - May) 
# Note: I did not write lines for Marion Lake or Bee Haven Bay, but this shows the data structure with three example lakes.
lake_counts = data.frame(lake_name = c("Seneca", "Seneca", "Seneca", "Seneca", "Cayuga", "Cayuga", "Otter Bay", "Otter Bay"),
                         start_time = c("10/1/2021", "10/10/2021", "10/15/2021", "10/20/2022", "10/1/2021", "4/6/2022", "10/1/2021", "10/2/2021"),
                         end_time = c("10/10/2021", "10/15/2021", "10/20/2022", "5/31/2023", "4/6/2022", "5/31/2023", "10/2/2021", "5/31/2022"),
                         count = c(2, 1, 0, 1, 0, 1, 1, 0))

# Attempted Data Processing

library(tidyverse)
library(lubridate)

# Add day-of-year columns
df1 = bird_flights %>%
        mutate(day_of_year_arrive = lubridate::yday(arrival_date),
               day_of_year_depart = lubridate::yday(departure_date))


# Middle Data Frame: group data by lake_name and arrival day-of-year, then summarize by adding the heron tag_id counts 
# Problem: Doesn't take into account departure for residence times
df2 = df1 %>% 
   dplyr::group_by(lake_name, day_of_year_arrive) %>%
   dplyr::summarize( n = count(unique(tag_id)))

注意:如果我假设这个数据集是针对迁徙季节(10 月 1 日至 5 月 31 日)的。如果我之前的旅行为零,我假设这只鸟在赛季开始时在湖边。如果我的后续旅行为零,我假设这只鸟会留在湖边直到季节结束。

我们跟随三只苍鹭。我们可以看到两只苍鹭(标签 ID 4001 和 4042)在纽约的塞内卡湖开始迁徙。他们都在马里恩湖停留,然后在佛罗里达州的蜜蜂港湾过冬。然后在春天,苍鹭 4001 返回塞内卡湖。苍鹭 4042 在春天前往纽约州卡尤加湖。苍鹭 5959 在佛罗里达州的水獭湾开始过冬,然后在蜜蜂港湾度过剩余的季节。它从不向北迁移。

我希望我的最终答案是展示苍鹭是如何从塞内卡湖开始的,然后在迁徙中慢慢离开,最终在春天返回。与此同时,蜜蜂港湾(Bee Haven Bay)冬季的鸟类数量缓慢增加,这些鸟类在春季开始离开。在现实生活中,我有更多的北部和南部湖泊,但这两个将适用于我们的例子。

R DPLYR 时序 数据整理

评论


答: 暂无答案