检查序列中的值是否严格增加,而不是 NA 的

Check if values in a series are strictly increasing, other than NA's

提问人:Hack-R 提问时间:10/13/2015 最后编辑:Hack-R 更新时间:10/13/2015 访问量:1295

问:

我有数据需要使用一个函数进行清理,该函数将

  • 检查值列 () 中的元素是否是,如果是,请将子集中的所有后续值替换为 (其中 subsequent 由 定义),数据是子集,如下面的函数所示grad_rate_yr_valNANAgrad_rate_yr_num
  • 检查序列中是否有任何后续值减少(即非严格增加),如果是,则将所有后续值替换为NA

我已经设法在下面的函数中完成了这些要点中的第一个。但是,我可以在第二个要点上获得一些帮助。

这是我的功能:

 testit <- function(all.long){
  for(i in unique(all.long$program_nm)){
    for(ii in 1:8){
      for(iii in unique(all.long$cohort_year[all.long$program_nm == i & all.long$grad_rate_yr_num  == ii]))
      {
        if(is.na(all.long[all.long$program_nm == i & all.long$grad_rate_yr_num == ii
                          & all.long$cohort_year == iii, "grad_rate_yr_val"]))
        {
          n <- ii + 1
          all.long[all.long$program_nm == i & all.long$grad_rate_yr_num  >= n
                   & all.long$cohort_year == iii, "grad_rate_yr_val"]        <- NA
        }
      }
    }
  }
  return(all.long)
}

下面是我的数据的子集,以防你想用它来测试你的解决方案:

> dput(data)
structure(list(cohort_year = c(2000L, 2000L, 2000L, 2000L, 2000L, 
2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 
2001L, 2001L, 2001L, 2001L, 2002L, 2002L, 2002L, 2002L, 2002L, 
2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 2003L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L, 2004L, 2004L, 2004L, 2004L, 2004L, 
2004L, 2004L, 2004L, 2004L, 2005L, 2005L, 2005L, 2005L, 2005L, 
2005L, 2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 
2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 
2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 
2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 2012L, 
2012L, 2012L, 2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L), program_nm = structure(c(8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("BS Child", 
"BS Criminal Justice", "BS Health Studies", "BS Human Services", 
"BS Nursing", "BS Psychology", "BS Public Health", "BSBA", "other"
), class = "factor"), Term_Type = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Q", 
"S"), class = "factor"), Degree_Level = structure(c(3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Doctoral", 
"Masters", "Undergraduate", "BS"), class = "factor"), certificate_flag = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    cohort_size_yr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 110L, 110L, 110L, 110L, 110L, 110L, 110L, 110L, 110L, 
    363L, 363L, 363L, 363L, 363L, 363L, 363L, 363L, 363L, 738L, 
    738L, 738L, 738L, 738L, 738L, 738L, 738L, 738L, 602L, 602L, 
    602L, 602L, 602L, 602L, 602L, 602L, 602L, 606L, 606L, 606L, 
    606L, 606L, 606L, 606L, 606L, 606L, 793L, 793L, 793L, 793L, 
    793L, 793L, 793L, 793L, 793L, 1047L, 1047L, 1047L, 1047L, 
    1047L, 1047L, 1047L, 1047L, 1047L, 1542L, 1542L, 1542L, 1542L, 
    1542L, 1542L, 1542L, 1542L, 1542L, 999L, 999L, 999L, 999L, 
    999L, 999L, 999L, 999L, 999L, 977L, 977L, 977L, 977L, 977L, 
    977L, 977L, 977L, 977L, 756L, 756L, 756L, 756L, 756L, 756L, 
    756L, 756L, 756L, 968L, 968L, 968L, 968L, 968L, 968L, 968L, 
    968L, 968L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 
    175L), grad_rate_yr_num = c("1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", 
    "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", 
    "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", 
    "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", 
    "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", 
    "4", "5", "6", "7", "8", "9"), grad_rate_yr_val = c(0.338054164461162, 
    0.387455851221823, 0.424682338644358, 0.427085491165484, 
    0.43164094786971, 0.419998451137316, 0.404690563376752, 0.39455471043309, 
    0.471121387897879, 0.308925838858349, 0.35832752561901, 0.395554013041545, 
    0.397957165562672, 0.402512622266898, 0.390870125534503, 
    0.37556223777394, 0.365426384830278, 0.441993062295066, 0.279675200277845, 
    0.329076887038507, 0.366303374461042, 0.368706526982168, 
    0.373261983686394, 0.361619486954, 0.346311599193436, 0.336175746249774, 
    0.412742423714563, 0.247336159010819, 0.29673784577148, 0.333964333194016, 
    0.336367485715142, 0.340922942419368, 0.329280445686973, 
    0.31397255792641, 0.303836704982748, 0.380403382447537, 0.210471537569466, 
    0.259873224330127, 0.297099711752662, 0.299502864273789, 
    0.304058320978014, 0.29241582424562, 0.277107936485056, 0.266972083541394, 
    0.343538761006183, 0.169876370308747, 0.219278057069408, 
    0.256504544491943, 0.25890769701307, 0.263463153717295, 0.251820656984901, 
    0.236512769224337, 0.226376916280676, 0.302943593745464, 
    0.144906685947195, 0.194308372707856, 0.231534860130391, 
    0.233938012651518, 0.238493469355743, 0.226850972623349, 
    0.211543084862785, 0.201407231919124, 0.277973909383912, 
    0.115656047366698, 0.165057734127359, 0.202284221549894, 
    0.204687374071021, 0.209242830775247, 0.197600334042852, 
    0.182292446282289, 0.172156593338627, 0.248723270803415, 
    0.0808095900571362, 0.130211276817797, 0.167437764240332, 
    0.169840916761459, 0.174396373465685, 0.16275387673329, 0.147445988972727, 
    0.137310136029065, 0.213876813493853, 0.0439143903713685, 
    0.0933160771320296, 0.130542564554565, 0.132945717075691, 
    0.137501173779917, 0.125858677047523, 0.110550789286959, 
    0.100414936343297, 0.176981613808086, -0.000350166219887441, 
    0.0490515205407736, 0.0862780079633087, 0.0886811604844353, 
    0.0932366171886609, 0.0815941204562667, 0.066286232695703, 
    0.0561503797520412, 0.13271705721683, -0.0128745051020512, 
    0.0365271816586099, 0.073753669081145, 0.0761568216022716, 
    0.0807122783064972, 0.0690697815741029, 0.0537618938135392, 
    0.0436260408698775, 0.120192718334666, -0.0413301093276067, 
    0.00807157743305436, 0.0452980648555894, 0.047701217376716, 
    0.0522566740809416, 0.0406141773485474, 0.0253062895879837, 
    0.0151704366443219, 0.0917371141091104, -0.0637006429133595, 
    -0.0142989561526984, 0.0229275312698367, 0.0253306837909633, 
    0.0298861404951888, 0.0182436437627946, 0.0029357560022309, 
    -0.00720009694143085, 0.0693665805233576, -0.0993115563334487, 
    -0.0499098695727876, -0.0126833821502525, -0.0102802296291259, 
    -0.00572477292490037, -0.0173672696572946, -0.0326751574178583, 
    -0.0428110103615201, 0.0337556671032684, -0.104191334110341, 
    -0.0547896473496804, -0.0175631599271453, -0.0151600074060187, 
    -0.0106045507017931, -0.0222470474341873, -0.0375549351947511, 
    -0.0476907881384128, 0.0288758893263757), grad_sum_yr_num = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", 
    "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", 
    "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", 
    "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9", "1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "1", "2", "3", "4", 
    "5", "6", "7", "8", "9", "1", "2", "3", "4", "5", "6", "7", 
    "8", "9", "1", "2", "3", "4", "5", "6", "7", "8", "9"), grad_sum_yr_val = c(0, 
    0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
    1, 3, 4, 4, 4, 4, 4, 0, 19, 43, 47, 51, 53, 53, 54, 54, 7, 
    46, 110, 129, 143, 147, 147, 96.9108663255262, 150, 8, 79, 
    199, 269, 285, 290, 295, 281, 300, 5, 62, 175, 230, 245, 
    249, 218, 219, 254, 6, 58, 167, 212, 228, 231, 229, 109, 
    235, 0, 35, 106, 158, 180, 189, 86, 0, 190, 1, 25, 91, 148, 
    183, 89, 0, 0, 193, 2, 30, 112, 162, 79, 0, 0, 0, 187, 3, 
    40, 108, 67, 0, 0, 0, 0, 135, 6, 40, 42, 0, 0, 0, 0, 0, 78, 
    3, 16, 0, 0, 0, 0, 0, 0, 21, 5, 0, 0, 0, 0, 0, 0, 0, 5, 0, 
    0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("cohort_year", "program_nm", 
"Term_Type", "Degree_Level", "certificate_flag", "cohort_size_yr", 
"grad_rate_yr_num", "grad_rate_yr_val", "grad_sum_yr_num", "grad_sum_yr_val"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"21", "22", "23", "24", "25", "26", "27", "28", "30", "32", "34", 
"36", "38", "40", "42", "44", "46", "48", "50", "52", "54", "56", 
"58", "60", "62", "64", "66", "68", "70", "72", "74", "76", "78", 
"80", "82", "83", "84", "85", "86", "87", "88", "89", "90", "91", 
"92", "93", "94", "95", "96", "97", "98", "99", "103", "108", 
"113", "118", "123", "128", "133", "138", "143", "155", "166", 
"177", "188", "199", "210", "221", "232", "243", "258", "273", 
"288", "303", "318", "333", "348", "363", "378", "396", "414", 
"432", "450", "468", "486", "504", "522", "540", "559", "578", 
"597", "616", "635", "654", "673", "692", "711", "731", "751", 
"771", "791", "811", "831", "851", "871", "891", "911", "932", 
"953", "974", "995", "1016", "1037", "1058", "1079", "1099", 
"1119", "1139", "1159", "1179", "1199", "1219", "1239", "1259"
), class = "data.frame")
r 函数

评论

0赞 Frank 10/13/2015
这看起来很相关:stackoverflow.com/q/13093912/1191259
0赞 Shawn Mehan 10/13/2015
澄清一下,您希望第二个值是单调的还是严格单调递减的?我读到你在你的规范中都说了。

答:

2赞 Frank 10/13/2015 #1

你说第一颗子弹被处理掉了,所以我只看第二颗:

检查序列中是否有任何后续值减少(即非严格增加),如果是,则将所有后续值替换为 NA

x 减少

# example data
set.seed(1)
x = sample(10) 
# 3  4  5  7  2  8  9  6 10  1

replace(x, seq_along(x) >= which(x < cummax(x))[1], NA)
# 3  4  5  7 NA NA NA NA NA NA

which(x < cummax(x))是序列递减的位置。答案几乎完全借鉴@flodel。如果这真的是答案,那么这个问题可能应该作为骗局关闭。


X 不会严格增加

需要进行一些修改才能达到“不严格增加”,例如

# new example
set.seed(1)
x = sample(c(1:3,1:3,1:3))
# 3 3 2 3 2 1 2 1 1

r <- rank(x) + sort(runif(length(x)), decreasing=TRUE)
x <- replace(x, seq_along(x) >= which(r < cummax(r))[1], NA)
x
# 3 NA NA NA NA NA NA NA NA

这个想法取自 @Arun

评论

1赞 Hack-R 10/13/2015
谢谢,好的,让我试着看看我是否可以将其集成到函数中并对其进行测试
0赞 Frank 10/13/2015
@Hack-R 感谢您的编辑。如果你想修改,标准的方式是或也许,尽管我发现后者很难阅读。xx[ seq_along(x) >= which(r < cummax(r))[1] ] <- NAis.na(x) <- seq_along(x) >= which(r < cummax(r))[1]