TypeError: unhashable type: 'Series' - 如果可能的话,无法弄清楚如何将单列 df 作为非系列对象传递?

TypeError: unhashable type: 'Series' - can't work out how to pass a one column df as a non-series object if even possible?

提问人:Orla Rooney 提问时间:10/27/2023 更新时间:10/31/2023 访问量:49

问:

因此,我有一个数据帧,并编写了一个函数,用于根据一组条件在新列中添加值。

代码有两个 DataFrame

第一个是merged_df,这是我尝试添加新列的 df,它具有以下上下文属性:

merged_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 889 entries, 0 to 888
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   UNIQUE_ID        889 non-null    object
 1   REGISTERED_NAME  889 non-null    object
 2   EMAIL            889 non-null    object
 3   DBS_CHECK_DATE   889 non-null    object
 4   EXPIRY_DATE      889 non-null    object
 5   UNIQUE_ID        889 non-null    object
 6   Status           889 non-null    object
 7   Action           889 non-null    object
dtypes: object(8)

然后有一个单列 DataFrame,其中的内容对于指定条件之一是必需的

SAP_only_EAs.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64 entries, 0 to 63
Data columns (total 1 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   UNIQUE_ID  64 non-null     object
dtypes: object(1)

函数如下:

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta

def current_month():
    return datetime.datetime.now().strftime("%m/%Y")

def get_current_date():
    return datetime.datetime.now().strftime("%d/%m/%Y")

def three_month_ahead():
    current = datetime.datetime.now()
    three_month = current + relativedelta(months=3)
    return three_month.strftime("%m/%Y")

def next_month_expiry():
    current = datetime.datetime.now()
    nextmonth = current + relativedelta(months=1)
    return nextmonth.strftime("%m/%Y")

def year_ahead():
    current = datetime.datetime.now()
    year_on = current + relativedelta(months = 12)
    return year_on.strftime("%d/%m/%Y")

#sap_only_eas_set = set(SAP_only_EAs['UNIQUE_ID'].tolist())

def action_col(expiry_date, Status, uniqueID):
    three_months_ahead = pd.to_datetime(three_month_ahead(), format='%m/%Y')
    next_month = pd.to_datetime(next_month_expiry(), format='%m/%Y')
    current_date_today = pd.to_datetime(get_current_date(), format='%d/%m/%Y')
    year_on = pd.to_datetime(year_ahead(), format='%d/%m/%Y')

    expiry_date = pd.to_datetime(expiry_date, format='%d/%m/%Y')  # Convert the expiry_date to datetime

    if f"{expiry_date.year}-{expiry_date.month:02}" == f"{three_months_ahead.year}-{three_months_ahead.month:02}":
        return 'Send 3 month request'
    elif expiry_date.month == next_month.month and expiry_date.year == next_month.year:
        return 'Send 1 month reminder'
    elif expiry_date < current_date_today and Status == "Not Suspended":
        return 'DBS expired: Suspend & update iAdmin notes'
    elif (year_on < expiry_date < current_date_today) and Status == "Suspended":
        return 'No action needed - correct suspensions in place'
    elif uniqueID in SAP_only_EAs['UNIQUE_ID']:
        return 'No action needed - SAP only assessor'
    elif (expiry_date < year_on) and Status == "Suspended":
        return 'DBS expired for over a year – look at whether account closure is appropriate'
    else:
        return 'No action required – valid DBS check'


merged_df['Action'] = merged_df.apply(lambda row: action_col(row['EXPIRY_DATE'], row['Status'], row['UNIQUE_ID']), axis=1)

所以我遇到的错误是这部分条件

    elif uniqueID in SAP_only_EAs['UNIQUE_ID']:
        return 'No action needed - SAP only assessor'

当我注释掉它时,该函数运行良好,但是包含此功能后,我收到此错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
c:\Users\orla_quidos\Code\DBS project\STATUS section of the scheduled job python script.ipynb Cell 9 line 5
     48     else:
     49         return 'No action required – valid DBS check'
---> 51 merged_df['Action'] = merged_df.apply(lambda row: action_col(row['EXPIRY_DATE'], row['Status'], row['UNIQUE_ID']), axis=1)

File c:\Users\orla_quidos\anaconda3\lib\site-packages\pandas\core\frame.py:9568, in DataFrame.apply(self, func, axis, raw, result_type, args, **kwargs)
   9557 from pandas.core.apply import frame_apply
   9559 op = frame_apply(
   9560     self,
   9561     func=func,
   (...)
   9566     kwargs=kwargs,
   9567 )
-> 9568 return op.apply().__finalize__(self, method="apply")

File c:\Users\orla_quidos\anaconda3\lib\site-packages\pandas\core\apply.py:764, in FrameApply.apply(self)
    761 elif self.raw:
    762     return self.apply_raw()
--> 764 return self.apply_standard()

File c:\Users\orla_quidos\anaconda3\lib\site-packages\pandas\core\apply.py:891, in FrameApply.apply_standard(self)
    890 def apply_standard(self):
--> 891     results, res_index = self.apply_series_generator()
...
--> 371     hash(key)
    372     try:
    373         key = ensure_python_int(key)

TypeError: unhashable type: 'Series'

非常感谢任何想法,SAP_only_EA只是转换为数据帧的 SQL 查询的一列输出,我不知道如何更改其格式以便函数接受它,我试图列出(在此处注释)和许多其他事情无济于事?!

TIA!:)

Python Pandas TypeError 系列

评论

1赞 mozway 10/27/2023
给出一个最小的可重复数据和预期输出示例。您不需要将 apply 用于您正在尝试执行的操作。最有可能.numpy.select
0赞 Péter Szilvási 10/27/2023
我没有看到初始化,但似乎它是一种字典类型。您可以检查字典中的键,但键是散列的。要按键搜索,序列必须是可散列的。由于它不是可散列的,因此程序会抛出错误。uniqueIdSAP_only_EAs['UNIQUE_ID']
0赞 Matthias Huschle 10/27/2023
你在merged_df UNIQUE_ID两次。因此,uniqueID 参数可能不是标量。
0赞 Orla Rooney 10/30/2023
@MatthiasHuschle很好 - 谢谢伙计!
0赞 Orla Rooney 10/30/2023
@mozway非常感谢你,这正是我不知道我需要的!

答:

0赞 Orla Rooney 10/30/2023 #1
import pandas as pd import numpy as np from datetime import datetime
from dateutil.relativedelta import relativedelta

current_date_today = datetime.today()

three_month_ahead = current_date_today + pd.DateOffset(months=3)

current_month = datetime.now().strftime("%m/%Y")

next_month = current_date_today + pd.DateOffset(months=1)

year =  current_date_today + pd.DateOffset(months=12)

expiry_date = pd.to_datetime(merged_df['EXPIRY_DATE'],
format='%d/%m/%Y')

CONDLIST = [(expiry_date.dt.year == three_month_ahead.year) &
(expiry_date.dt.month == three_month_ahead.month),
           (expiry_date.dt.month == next_month.month) & (expiry_date.dt.year == next_month.year),
           (expiry_date < current_date_today) & (merged_df['Status'] == "Not Suspended"),
           ((year < expiry_date) & (expiry_date < current_date_today) & (merged_df['Status'] == "Suspended")),
           (merged_df['UNIQUE_ID'].isin(SAP_only_EAs['UNIQUE_ID'])),
            (expiry_date < year) & (merged_df['Status'] == "Suspended")]
 
 
 CHOICELIST = ["Send 3 month request",
               "Send 1 month reminder",
               "DBS expired: Suspend & update iAdmin notes",
               "No action needed - correct suspensions in place",
               "No action needed - SAP only assessor",
               "DBS expired for over a year – look at whether account closure is appropriate"]
 
 merged_df['Action'] = np.select(CONDLIST, CHOICELIST, default = "No
 action required – valid DBS check")

Thanks to the guy who suggest I use np.select - my new found love <3 got it to work :)