连续 2 年在 R 中根据某些值创建新变量-解网

问：

我正在尝试使用 UCDP 与战斗相关的死亡数据集，称为 BattleDeaths_v22_1_conf from https://ucdp.uu.se/downloads/（参见 UCDP 与战斗相关的死亡数据集版本 23.1）

我想创建一个新的变量或数据集，该变量或数据集仅包含连续 2 年有 1000 人死亡的国家/地区，并且仅在 2008 年之后。但是，我最终得到了一个没有观测值的变量。

我使用了数据集的“国家”变量（location_id）和战斗死亡变量（bd_best）。

到目前为止，我已经在 R 中完成了此操作：

library(dplyr)

filtered_data <- subset(BattleDeaths_v22_1_conf), bd_best >= 1000 & year >= 2008)

filtered_data <- filtered_data %>%
     arrange(location_inc, year) %>%
     group_by(location_inc) %>%
     mutate(sum_deaths_two_years = lag(bd_best) + bd_best)

目前为止，一切都好。

final_data <- filtered_data %>%
      group_by(location_inc) %>%
      filter(all(sum_deaths_two_years >= 2000))

现在，我最终得到一个具有 0 个观测值的变量。但是，我可以在原始数据集中看到，有一些观测值符合我的标准。

理想情况下，我还希望有一个单独的变量，如果国家一年有 1000 例与战斗有关的死亡，并且前一年或下一年至少有 25 例与战斗有关的死亡，则包括这些国家。但是我不知道如何使用 R，任何帮助将不胜感激

R 滤波

library(dplyr)

# Data ------------------------------
example_df <- tibble::tribble(
  ~location_inc,  ~year, ~bd_best,
  "Iraq",  2009L,    1036L,
  "Iraq",  2010L,     989L,
  "Iraq",  2011L,     864L,
  "Iraq",  2012L,     565L,
  "Iraq",  2013L,    1870L, # Desired
  "Iraq",  2014L,   13761L, # Desired
  "Iraq",  2015L,   10981L, # Desired
  "Iraq",  2016L,    9775L, # Desired
  "Iraq",  2017L,   10025L, # Desired
  "Iraq",  2018L,     866L,
  "Iraq",  2019L,     498L,
  "Iraq",  2020L,     671L,
  "Iraq",  2021L,     707L,
  "Iraq",  2022L,     335L,
  "Sudan", 2009L,     353L,
  "Sudan", 2010L,    1010L, # Desired
  "Sudan", 2011L,    1404L, # Desired
  "Sudan", 2012L,    1173L, # Desired
  "Sudan", 2013L,     594L,
  "Sudan", 2014L,     856L,
  "Sudan", 2015L,    1264L, # Desired
  "Sudan", 2016L,    1309L, # Desired
  "Sudan", 2017L,     160L,
  "Sudan", 2018L,     243L,
  "Sudan", 2020L,      45L,
  "Sudan", 2021L,      31L,
  "Sudan", 2022L,      47L)

# Code ------------------------------
example_df <- filter(
  example_df, 
  .by = location_inc,
  bd_best >= 1000, 
  lag(bd_best, default = -1) >= 1000 | lead(bd_best, default = -1) >= 1000)

# Outcome ---------------------------
example_df

# A tibble: 10 × 3
   location_inc  year bd_best
   <chr>        <int>   <int>
 1 Iraq          2013    1870
 2 Iraq          2014   13761
 3 Iraq          2015   10981
 4 Iraq          2016    9775
 5 Iraq          2017   10025
 6 Sudan         2010    1010
 7 Sudan         2011    1404
 8 Sudan         2012    1173
 9 Sudan         2015    1264
10 Sudan         2016    1309

来源： https://ucdp.uu.se/downloads/brd/ucdp-brd-dyadic-231-xlsx.zip

1赞 LE__Visionnaire 11/16/2023 #2

使用包“data.table”和“magrittr”的另一个解决方案将为您提供所需的输出

 library(data.table)
 library(magrittr)


 DT # your excel file 

 # Put the excel tab as a data.table, select filter columns and rename them
 DT %>% setDT() %>% .[,(Country = location_inc , year , Deaths = bd_best] %>%

 .[Deaths>=1000 & year>= 2008] %>% # we filter using desired conditions 


  # we add a columns containing for each row the years where the country is present                     in the table
  .[, YD := year %>% unique %>% toString , by = Country] %>% 

  # we now look if there are consecutive years in this list of years for each country
  .[as.character(Year+1) %in% YD] %>% 

  # we select the countries respecting this conditions
    .[, .(Country = unique(Country))] %>% 

  # we display the output
  .[] }

如果要保留 3 列（国家/地区、死亡人数和年份），请不要使用仅选择国家/地区列的行。

上一个：将数据和文本从一个单元格中分离出来，以提供带有摘要的持续时间

下一个：Power BI 切片器，用于选择列而不是行

连续 2 年在 R 中根据某些值创建新变量

Create new variable in R based on certain values 2 years in a row

评论