Pandas:创建 1 个函数读取 json,然后创建另一个函数创建数据帧

Pandas: Create 1 function to read json then create another function to create dataframe

提问人:Jay Cheng 提问时间:2/6/2021 更新时间:2/6/2021 访问量:49

问:

我想创建一个函数来从 API 获取数据,然后创建另一个函数来创建和清理相应的数据帧以供使用。

第一组 def 如下所示,工作正常:

def get_data():

    print('start download the 1st set')
    confirm_details = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 1st set')

    print('start download the 2nd set')
    latest_situ = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Flatest_situation_of_reported_cases_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 2nd set')

    print('start download the final set')
    residential = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fbuilding_list_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the final set')

get_data()

第二个定义如下,但它说给我一个错误,即“NameError:名称'confirm_details'未定义:

def clean_confirm_df():
    confirm_df = pd.read_json(io.StringIO(confirm_details.decode('utf-8')))
    confirm_df.columns = confirm_df.columns.str.replace(" ", "_" )
    confirm_df.columns = confirm_df.columns.str.replace('/', "_")
    confirm_df.columns = confirm_df.columns.str.replace("*", "")
    confirm_df.columns = confirm_df.columns.str.strip()
    confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'], dayfirst=True)
    confirm_df.rename(columns = {'Confirmed_probable': 'Confirmed'}, inplace = True)
    confirm_df = confirm_df.drop(['Name_of_hospital_admitted', 'Date_of_onset'], axis = 1)
    confirm_df['HK_Non-HK_resident'] = confirm_df['HK_Non-HK_resident'].str.upper()
    confirm_df.head()
    
clean_confirm_df()

我看了一下第一个定义,我看到定义了“confirm_details”。我尝试过,创建相应 df 作品(confirm_df、latest_situ_df 和 residential_df)的代码在单独运行时工作正常。

我正在自学 python 和 pandas,感谢您提供任何建议,我应该如何更改我的代码以使其正常工作。

谢谢。

Python Pandas 函数 DataFrame

评论

1赞 Rob Raymond 2/6/2021
这一切都与范围有关,这些变量是在范围中定义的,而不是全局变量。你注意到函数,但你没有定义函数,没有返回任何东西。我建议从返回对 JSON 的引用并将其作为参数传递给 .值得对 python 编程的基础知识进行更多的在线学习get_data()dictget_data()clean_df()
0赞 Jay Cheng 2/6/2021
谢谢@RobRaymond它有效。是的,我同意你的建议,我应该做更多的在线学习。有时,当我在 youtube 上观看一些演示后,很难把事情弄清楚。我不明白你评论说“我注意到功能,但我没有定义功能”。我以为当我使用 def 时,函数被定义了。感谢您的帮助,祝您有美好的一天
0赞 Rob Raymond 2/6/2021
这有点老派,事实上我使用过许多编程语言。我喜欢函数返回某些东西的定义。子例程只执行某些操作,但不返回任何内容。但是,我确实认为了解这些概念是有用的。

答:

0赞 Rob Raymond 2/6/2021 #1

根据注释 - 构建代码,以便您了解变量的范围。你假设一切都是全球性的,这将是一件非常糟糕的事情......

def get_data():
    ret = {}

    print('start download the 1st set')
    ret["confirm_details"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 1st set')

    print('start download the 2nd set')
    ret["latest_situ"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Flatest_situation_of_reported_cases_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 2nd set')

    print('start download the final set')
    ret["residential"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fbuilding_list_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the final set')

    return ret

def clean_confirm_df(data):
    confirm_df = pd.read_json(io.StringIO(data["confirm_details"].decode('utf-8')))
    confirm_df.columns = confirm_df.columns.str.replace(" ", "_" )
    confirm_df.columns = confirm_df.columns.str.replace('/', "_")
    confirm_df.columns = confirm_df.columns.str.replace("*", "")
    confirm_df.columns = confirm_df.columns.str.strip()
    confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'], dayfirst=True)
    confirm_df.rename(columns = {'Confirmed_probable': 'Confirmed'}, inplace = True)
    confirm_df = confirm_df.drop(['Name_of_hospital_admitted', 'Date_of_onset'], axis = 1)
    confirm_df['HK_Non-HK_resident'] = confirm_df['HK_Non-HK_resident'].str.upper()
    return confirm_df

mydata = get_data()
df = clean_confirm_df(mydata)
print(df.head().to_markdown())
start download the 1st set
complete download the 1st set
start download the 2nd set
complete download the 2nd set
start download the final set
complete download the final set
|    |   Case_no. | Report_date         | Gender   |   Age | Hospitalised_Discharged_Deceased   | HK_Non-HK_resident   | Case_classification   | Confirmed   |
|---:|-----------:|:--------------------|:---------|------:|:-----------------------------------|:---------------------|:----------------------|:------------|
|  0 |          1 | 2020-01-23 00:00:00 | M        |    39 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  1 |          2 | 2020-01-23 00:00:00 | M        |    56 | Discharged                         | HK RESIDENT          | Imported case         | Confirmed   |
|  2 |          3 | 2020-01-24 00:00:00 | F        |    62 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  3 |          4 | 2020-01-24 00:00:00 | F        |    62 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  4 |          5 | 2020-01-24 00:00:00 | M        |    63 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |