提问人:Rayed 提问时间:11/14/2023 最后编辑:petezurichRayed 更新时间:11/14/2023 访问量:60
清理 DataFrame:TypeError:“str”和“int”[duplicate] 的实例之间不支持“<”
Cleaning DataFrame: TypeError: '<' not supported between instances of 'str' and 'int' [duplicate]
问:
import pandas as pd
df = pd.read_csv('/content/data_mining/merged_dataset.csv')
#Cleaning data frame
#Delete rows where year is less than 1900:
for x in df.index:
if df.loc[x, 'startYear'] < 1900:
df.drop(x, inplace = True)
#Delete rows that are not movies:
for x in df.index:
if df.loc[x, 'titleType'] != 'movie':
df.drop(x, inplace = True)
#Delete all adult films:
for x in df.index:
if df.loc[x, 'isAdult'] == '1':
df.drop(x, inplace = True)
print(df.to_string())
TypeError Traceback (most recent call last)
<ipython-input-23-672b76e68b2d> in <cell line: 4>()
3 #Delete rows where year is less than 1900:
4 for x in df.index:
----> 5 if df.loc[x, 'startYear'] < 1900:
6 df.drop(x, inplace = True)
7
TypeError: '<' not supported between instances of 'str' and 'int'
在我的数据集中,我正在尝试删除 1900 年之前的所有内容行。“startYear”的整个列都是数字,但我无法将其与数字 1900 进行比较,因为它指出“startYear”显然是一个字符串,而不是像值“1900”那样的整数。
答:
0赞
Heinz Siahaan
11/14/2023
#1
那是因为列是一个字符串。您可以先将其转换为 int,然后过滤数据帧。示例代码如下startYear
# Convert 'startYear' column to numeric (ignore errors for non-convertible values)
df['startYear'] = pd.to_numeric(df['startYear'], errors='coerce')
# Filter the DataFrame based on the condition (remove year < 1900)
df = df[df['startYear'] >= 1900]
print(df)
评论
startyear
int
int