清理 DataFrame:TypeError:“str”和“int”[duplicate] 的实例之间不支持“<”

Cleaning DataFrame: TypeError: '<' not supported between instances of 'str' and 'int' [duplicate]

提问人:Rayed 提问时间:11/14/2023 最后编辑:petezurichRayed 更新时间:11/14/2023 访问量:60

问:

在此处查看图像 - 代码 python pandas 错误

import pandas as pd
df = pd.read_csv('/content/data_mining/merged_dataset.csv')

#Cleaning data frame

#Delete rows where year is less than 1900:
for x in df.index:
  if df.loc[x, 'startYear'] < 1900:
    df.drop(x, inplace = True)

#Delete rows that are not movies:
for x in df.index:
  if df.loc[x, 'titleType'] != 'movie':
    df.drop(x, inplace = True)

#Delete all adult films:
for x in df.index:
  if df.loc[x, 'isAdult'] == '1':
    df.drop(x, inplace = True)


print(df.to_string())

TypeError                                 Traceback (most recent call last)
<ipython-input-23-672b76e68b2d> in <cell line: 4>()
      3 #Delete rows where year is less than 1900:
      4 for x in df.index:
----> 5   if df.loc[x, 'startYear'] < 1900:
      6     df.drop(x, inplace = True)
      7 
TypeError: '<' not supported between instances of 'str' and 'int'

在我的数据集中,我正在尝试删除 1900 年之前的所有内容行。“startYear”的整个列都是数字,但我无法将其与数字 1900 进行比较,因为它指出“startYear”显然是一个字符串,而不是像值“1900”那样的整数。

Python pandas 数据帧 jupyter-notebook

评论

1赞 Panda Kim 11/14/2023
如果将列更改为,问题不会得到解决吗?问题是什么?问题是问如何将其更改为?startyearintint

答:

0赞 Heinz Siahaan 11/14/2023 #1

那是因为列是一个字符串。您可以先将其转换为 int,然后过滤数据帧。示例代码如下startYear

# Convert 'startYear' column to numeric (ignore errors for non-convertible values)
df['startYear'] = pd.to_numeric(df['startYear'], errors='coerce')

# Filter the DataFrame based on the condition (remove year < 1900)
df = df[df['startYear'] >= 1900]

print(df)