如何使用 python 和 pandas 为每个课程单元进行分组并生成汇总表？-解网

问：

我正在分析调查数据，我需要为每个课程单元生成汇总结果表。

我的调查数据如下所示：

课程代码	单位	Q1内容xxx	Q2contentyyy	Q3内容	Q4内容ddd	Q5内容FFFF	Q6内容ggg
3300	1	强烈同意	同意	中性	强烈同意	同意	中性
3300	2	强烈不同意	中性	反对	强烈不同意	中性	反对
3200	2	同意	反对	中性	同意	反对	中性
3100	1	同意	反对	中性	同意	反对	中性
3300	2	反对	中性	反对	反对	中性	反对

答案只包含五个选项——强烈同意;同意;中性;反对;强烈不同意

我想为每个课程单元生成一个汇总表，例如：课程单元 3300-2：

问题	强烈同意	同意	中性	反对	强烈不同意
Q1内容xxx	0%	0%	0%	50%	50%
Q2contentyyy	0%	0%	100%	0%	0%
Q3内容	0%	0%	0%	100%	0%
Q4内容ddd	0%	0%	0%	50%	50%
Q5内容FFFF	0%	0%	100%	0%	0%
Q6内容ggg	0%	0%	0%	100%	0%

我在想我应该使用“while 循环”来迭代和打印我需要的东西，但我卡在这里，我想知道是否有更好的方法来按结果分组以显示我想要的东西。

    i=1;
    while i < 6:
      print(df.loc[i, 'CourseCode':'Unit']);
      print(df.where(...
      i++;

这是我当时的想法的例子，一定有更好的想法，所以提前感谢您对我的帮助！

蟒熊猫数据帧分组依据

target = "3300-2"

tmp = df.set_index(["CourseCode", "Unit"])
tmp.index = map("{0[0]}-{0[1]}".format, tmp.index)

# cols = np.unique(tmp) or if order matters, you can hardcode the values
cols = ["StronglyAgree", "Agree", "Neutral", "Disagree", "StronglyDisagree"]

out = (
    tmp.loc[[target]].stack().str.get_dummies()
        .groupby(level=1).agg(lambda s: (s.sum())/s.size*100)
        .reindex(cols, axis=1, fill_value=0)
        .reset_index(names=["Question"])
)

输出：

print(out)

       Question  StronglyAgree  Agree  Neutral  Disagree  StronglyDisagree
0  Q1contentxxx              0      0      0.0      50.0              50.0
1  Q2contentyyy              0      0    100.0       0.0               0.0
2  Q3contentsss              0      0      0.0     100.0               0.0
3  Q4contentddd              0      0      0.0      50.0              50.0
4  Q5contentfff              0      0    100.0       0.0               0.0
5  Q6contentggg              0      0      0.0     100.0               0.0

	同意	反对	中性	强烈同意	强烈不同意
（3100， 1， 'Q1contentxxx'）	1	0	0	0	0
（3100， 1， 'Q2contentyyy'）	0	1	0	0	0
（3100， 1， 'Q3contentsss'）	0	0	1	0	0
（3100， 1， 'Q4contentddd'）	1	0	0	0	0
（3100， 1， 'Q5contentfff'）	0	1	0	0	0
（3100， 1， 'Q6contentggg'）	0	0	1	0	0
（3200， 2， 'Q1contentxxx'）	1	0	0	0	0
（3200， 2， 'Q2contentyyy'）	0	1	0	0	0
（3200， 2， 'Q3contentsss'）	0	0	1	0	0
（3200， 2， 'Q4contentddd'）	1	0	0	0	0
（3200， 2， 'Q5contentfff'）	0	1	0	0	0
（3200， 2， 'Q6contentggg'）	0	0	1	0	0
（3300， 1， 'Q1contentxxx'）	0	0	0	1	0
（3300， 1， 'Q2contentyyy'）	1	0	0	0	0
（3300， 1， 'Q3contentsss'）	0	0	1	0	0
（3300， 1， 'Q4contentddd'）	0	0	0	1	0
（3300， 1， 'Q5contentfff'）	1	0	0	0	0
（3300， 1， 'Q6contentggg'）	0	0	1	0	0
（3300， 2， 'Q1contentxxx'）	0	0.5	0	0	0.5
（3300， 2， 'Q2contentyyy'）	0	0	1	0	0
（3300， 2， 'Q3contentsss'）	0	1	0	0	0
（3300， 2， 'Q4contentddd'）	0	0.5	0	0	0.5
（3300， 2， 'Q5contentfff'）	0	0	1	0	0
（3300， 2， 'Q6contentggg'）	0	1	0	0	0

0赞 Andrej Kesely 11/18/2023 #3

另一种解决方案：

out = (
    df.set_index(["CourseCode", "Unit"])
    .stack()
    .reset_index()
    .rename(columns={0: "mood"})
)
out["Course"] = out.pop("CourseCode").astype(str) + "-" + out.pop("Unit").astype(str)

out = (
    out.assign(tmp=1)
    .pivot_table(
        index=["level_2", "Course"],
        columns="mood",
        values="tmp",
        aggfunc=len,
        margins=True,
        fill_value=0,
    )
    .rename_axis(index=[None, None], columns=None)
)
out = out.div(out.iloc[:, -1], axis=0) * 100

print(out)

指纹：

                     Agree    Disagree     Neutral  StronglyAgree  StronglyDisagree    All
Q1contentxxx 3100-1  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3200-2  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3300-1    0.0    0.000000    0.000000     100.000000          0.000000  100.0
             3300-2    0.0   50.000000    0.000000       0.000000         50.000000  100.0
Q2contentyyy 3100-1    0.0  100.000000    0.000000       0.000000          0.000000  100.0
             3200-2    0.0  100.000000    0.000000       0.000000          0.000000  100.0
             3300-1  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3300-2    0.0    0.000000  100.000000       0.000000          0.000000  100.0
Q3contentsss 3100-1    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3200-2    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3300-1    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3300-2    0.0  100.000000    0.000000       0.000000          0.000000  100.0
Q4contentddd 3100-1  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3200-2  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3300-1    0.0    0.000000    0.000000     100.000000          0.000000  100.0
             3300-2    0.0   50.000000    0.000000       0.000000         50.000000  100.0
Q5contentfff 3100-1    0.0  100.000000    0.000000       0.000000          0.000000  100.0
             3200-2    0.0  100.000000    0.000000       0.000000          0.000000  100.0
             3300-1  100.0    0.000000    0.000000       0.000000          0.000000  100.0
             3300-2    0.0    0.000000  100.000000       0.000000          0.000000  100.0
Q6contentggg 3100-1    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3200-2    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3300-1    0.0    0.000000  100.000000       0.000000          0.000000  100.0
             3300-2    0.0  100.000000    0.000000       0.000000          0.000000  100.0
All                   20.0   33.333333   33.333333       6.666667          6.666667  100.0

0赞 Panda Kim 11/18/2023 #4

第1步

通过3300-2过滤和熔化

target = '3300-2'
cond = df['CourseCode'].astype('str').add('-').add(df['Unit'].astype('str')).eq(target)
tmp = df[cond].melt(['CourseCode', 'Unit'])

TMP的：

CourseCode  Unit    variable    value
0   3300    2   Q1contentxxx    StronglyDisagree
1   3300    2   Q1contentxxx    Disagree
2   3300    2   Q2contentyyy    Neutral
3   3300    2   Q2contentyyy    Neutral
4   3300    2   Q3contentsss    Disagree
5   3300    2   Q3contentsss    Disagree
6   3300    2   Q4contentddd    StronglyDisagree
7   3300    2   Q4contentddd    Disagree
8   3300    2   Q5contentfff    Neutral
9   3300    2   Q5contentfff    Neutral
10  3300    2   Q6contentggg    Disagree
11  3300    2   Q6contentggg    Disagree

步骤2

聚合方式pd.crosstab

cols = ['StronglyAgree', 'Agree', 'Neutral', 'Disagree', 'StronglyDisagree']
tmp2 = pd.crosstab(tmp['variable'], tmp['value']).reindex(cols, axis=1, fill_value=0)

#change to percentage
out = tmp2.div(tmp2.sum(axis=1), axis=0).mul(100).astype('int').astype('str').add('%')\
          .rename_axis('Question').rename_axis('', axis=1).reset_index()

外：

如何使用 python 和 pandas 为每个课程单元进行分组并生成汇总表？

How to groupby and generate summary table for each course unit using python and pandas?

评论

评论

评论