过滤 pandas 数据帧中嵌套列表中的空元素-解网

问：

我在 pandas 数据帧中有一个列表，我想过滤它。例如，我有一个这样的数据帧：

{
    "examples": [
 
        {
            "website": "info",
            "df": [
                {
                    "Question": "What?",
                    "Answers": []
                },
                {
                    "Question": "how?",
                    "Answers": []
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
                "solution_sentence": "",
                "why_sentence": ""
            }
        },            
         {
            "website": "info2",
            "df": [
                {
                    "Question": "What?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "how?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
            }
        },

    ]
}

我的过滤功能：

def filter(data, name):
   resp = pd.concat([pd.DataFrame(data),
                         pd.json_normalize(data['examples'])],
                        axis=1)

    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['df'])],
                        axis=1)

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)
    

    resp.drop(resp[resp['df.Answers'].apply(len) == 0].index, inplace=True)

我想删除此数据帧中的空“答案”元素。我已经使用以下代码过滤了空的“problem_summary”元素：

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)

我怎样才能对“答案”元素做同样的事情？

我实际上并不期望特定的输出。我的代码的以下部分它抛出错误“AttributeError： 'list' object has no attribute 'keys'”。我认为这是由于空答案数组，所以我想删除这些部分。

 resp.rename(
        columns={0: 'Challenge', 1: 'Solution', 2: 'Importance'}, inplace=True)
    # challenge deserializing
    resp = pd.concat([pd.DataFrame(df_resp),
                         pd.json_normalize(resp['Challenge'])],
                        axis=1)
    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['Answers'])],
                        axis=1)

错误行：

     29 resp = pd.concat([pd.DataFrame(resp),
---> 30                      pd.json_normalize(resp['Answers'])],
     31                     axis=1)

Python Pandas 数据帧过滤嵌套列表

df = pd.json_normalize(
    data=data["examples"],
    meta=["website", "whitelisted_url", "exResponse"],
    record_path=["df"]
)
df = df.join(pd.DataFrame(df.pop("exResponse").tolist()))
df = df[df["Answers"].map(lambda d: len(d)) > 0]
df = df.replace("", np.nan).dropna(subset=["pb_sentence"], how="all")

过滤 pandas 数据帧中嵌套列表中的空元素

Filtering empty elements in a nested list in pandas dataframe

评论

评论