SAS 9.4 中具有哈希表的中位数-解网

问：

我想用哈希表按组计算 100 个变量的中位数。

我发现它用于计算 sashelp.cars 上发票的中位数，但是如果我想按品牌和型号进行中位数，例如，我该如何调整它？

data percentiles ; 
keep percentile Invoice ;
format percentile percent5.;

dcl hash ptiles (dataset: "sashelp.cars(where=(Invoice gt 0))",multidata:"Y",ordered:"A");
ptiles.definekey("Invoice");
ptiles.definedone();

declare hiter iterP ("ptiles");

array _ptiles(6) _temporary_ (.5 .05 .1 .25 .75 .95);
call sortn(of _ptiles(*));

num_items=ptiles.num_items;

do i=1 to dim (_ptiles);
    percentile=_ptiles(i);
    do while (Counter lt percentile*num_items);
    Counter+1;
    iterP.next();
end;
output;
end;
stop;
set sashelp.cars;
run;

事实上，在我的卷轴数据中，我想计算 100 个变量的中位数。实际上，我用 proc 单变量来做这件事，但它太长了（>12 小时）

SAS 哈希表哈希码中位数

与其使用哈希表来计算每个值的百分位数，不如考虑使用或来计算中位数，并可选择大大提高效率（链接）。专为大数据而设计，但也是多线程的，因此请尝试两者，看看哪个性能最好。使用以下方法查看此包含 100 个变量的 10M 行表的性能：hpsummarymeansqmethod=p2hpsummarymeansqmethod=p2

data have;
    array var[100];
    do i = 1 to 10000000;
        do j = 1 to 100;
            var[j] = rand('normal');
        end;
        output;
    end;
run;

proc hpsummary data=have qmethod=p2;
    var var:;
    output out=want
        median=;
quit;

在具有 16GB RAM 的机器上，使用 4 个线程花费了 1 分 5 秒。

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 102 variables.
NOTE: PROCEDURE HPSUMMARY used (Total process time):
      real time           1:05.69
      cpu time            4:05.92

您还可以与以下功能一起使用：qmethod=p2proc means

proc means data=have qmethod=p2 noprint;
    var var:;
    output out=want
        median=;
quit;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 102 variables.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           46.33 seconds
      cpu time            3:05.50

上一个：如何从Hashtable中获取列表字符串？

下一个：由于某种原因，我的拉普拉斯过滤器程序无法正常工作

SAS 9.4 中具有哈希表的中位数

median with hash table in sas 9.4

评论