序列的真值是模棱两可的。使用 a.empty、a.bool（）、a.item（）、a.any（）或 a.all（）-解网

问：

我想使用一个条件过滤我的数据帧，以保留具有特定列值的行，这些值超出了范围。我试过了：or[-0.25, 0.25]

df = df[(df['col'] < -0.25) or (df['col'] > 0.25)]

但是我收到错误：

ValueError：序列的真值不明确。使用 a.empty、a.bool（）、a.item（）、a.any（）或 a.all（）。

python pandas 数据帧布尔过滤

126赞 MaxU - stand with Ukraine 4/29/2016

use 代替|or

7赞 ColinMac 12/29/2018

下面是一个解决方法：abs(result['var'])>0.25

7赞 cs95 3/9/2019

相关新闻： Pandas 中布尔索引的逻辑运算符

3赞 AstroFloyd 2/3/2021

我在使用标准函数时遇到了相同的错误消息。将其替换为两个值之间的元素最大值解决了我的问题。max()numpy.maximum()

答：

1112赞 MSeifert 4/29/2016 #1

和 Python 语句需要真值。对于 pandas，这些被认为是不明确的，因此您应该使用“按位”（或）或（和）操作：orand|&

df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]

对于这些类型的数据结构，这些结构会重载，以产生元素级或 .orand

只是为了给这句话添加一些更多的解释：

当您想要获取：boolpandas.Series

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

你命中了运算符隐式将操作数转换为的地方（你使用了，但它也发生在和）：boolorandifwhile

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

除了这四个语句之外，还有几个 Python 函数隐藏了一些调用（如、、、 ...这些通常没有问题，但为了完整起见，我想提及这些。boolanyallfilterpandas.Series

就您而言，该例外并没有真正帮助，因为它没有提到正确的替代方案。对于和，如果要进行元素比较，可以使用：andor

numpy.logical_or：

>>> import numpy as np
>>> np.logical_or(x, y)

或者只是操作员：|

>>> x | y

numpy.logical_and：
```
>>> np.logical_and(x, y)
```
或者只是操作员：&
```
>>> x & y
```

如果您使用的是运算符，请确保正确设置括号，因为运算符的优先级。

有几个逻辑 NumPy 函数应该可以处理。pandas.Series

如果您在执行 or 时遇到 Exception 中提到的替代方案，则更适合它。我将简要解释其中的每一个：ifwhile

如果您想检查您的系列是否为空：
```
>>> x = pd.Series([])
>>> x.empty
True
>>> x = pd.Series([1])
>>> x.empty
False
```
如果 Python 没有明确的布尔解释，它通常会将容器（如、、 ...）的 gth 解释为真值。因此，如果您想要类似 Python 的检查，您可以执行以下操作：或者代替 .lenlisttupleif x.sizeif not x.emptyif x

如果 your 包含且仅包含一个布尔值：Series

>>> x = pd.Series([100])
>>> (x > 50).bool()
True
>>> (x < 50).bool()
False

如果您想检查系列的第一个也是唯一一个项目（例如，但它甚至适用于非布尔内容）：.bool()
```
>>> x = pd.Series([100])
>>> x.item()
100
```

如果要检查所有或任何项目是否为非零、非空或非 False：

>>> x = pd.Series([0, 1, 2])
>>> x.all()   # Because one element is zero
False
>>> x.any()   # because one (or more) elements are non-zero
True

63赞 Alexander 4/29/2016 #2

对于布尔逻辑，请使用和。&|

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

要查看发生了什么，每次比较都会获得一列布尔值，例如，

df.C > 0.25

0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

当您有多个条件时，您将返回多个列。这就是联接逻辑不明确的原因。单独使用 or 处理每一列，因此首先需要将该列减少到单个布尔值。例如，查看每列中的任何值或所有值是否为 True。andor

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()

True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()

False

实现相同目的的一种复杂方法是将所有这些列压缩在一起，并执行适当的逻辑。

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

有关详细信息，请参阅文档中的布尔索引。

15赞 Cảnh Toàn Nguyễn 1/19/2017 #3

或者，您也可以使用操作员模块。Python 文档中提供了更详细的信息：

import operator
import numpy as np
import pandas as pd

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

0赞 Dmitri K. 2/13/2023

接线员帮我解决了jinja的问题。Jinja 不接受 & 运营商。Pandas 查询无法访问 jinja 变量。但是带有运算符的 .loc 有效！谢谢！

5赞 bli 11/2/2017 #4

这个出色的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一个可能适用于类似情况的解决方案：使用查询方法：

df = df.query("(col > 0.25) or (col < -0.25)")

另请参阅索引和选择数据。

（我目前正在使用的数据帧的一些测试表明，这种方法比在一系列布尔值上使用按位运算符要慢一些：2 ms vs. 870 μs）

一个警告：至少有一种情况并不简单，那就是列名恰好是 Python 表达式。我有名为的列，并且想要执行以下查询：WT_38hph_IP_2WT_38hph_input_2log2(WT_38hph_IP_2/WT_38hph_input_2)"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

我获得了以下异常级联：

KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function

我猜发生这种情况是因为查询解析器试图从前两列中生成一些东西，而不是用第三列的名称标识表达式。

此处提出了一种可能的解决方法。

128赞 Nipun 9/12/2019 #5

Pandas 使用按位 .此外，每个条件都应包含在 .&|( )

这工作原理：

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

但是，不带括号的相同查询不会：

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]

0赞 Peter Mortensen 2/3/2023

为什么每个条件都应该包装在里面？请通过编辑（更改）您的答案来回复，而不是在评论中（没有“编辑：”，“更新：”或类似内容 - 答案应该看起来像今天写的一样）。（但是没有 ****** “编辑：”、“更新：”或类似内容的 ****** - 答案应该看起来像是今天写的一样） ( )

0赞 Peter Mortensen 2/3/2023

这个答案可能会提供原因。

0赞 iretex 5/11/2020 #6

我遇到了同样的错误，并在PySpark数据帧上停滞了几天。我能够通过将 na 值填充 0 来成功解决它，因为我正在比较两个字段的整数值。

1赞 Hemanth Kollipara 7/16/2020 #7

您需要使用按位运算符，而不是和而不是在 pandas 中使用。你不能简单地使用 python 中的 bool 语句。|or&and

对于更复杂的筛选，请创建一个并在 DataFrame 上应用掩码。
将所有查询放在掩码中并应用它。假设mask

mask = (df["col1"]>=df["col2"]) & (stock["col1"]<=df["col2"])
df_new = df[mask]

0赞 satinder singh 10/9/2020 #8

一件小事，浪费了我的时间。

将条件（如果使用 “ = ”， “ ！= ”进行比较）放在括号中。如果不这样做，也会引起此异常。

这将起作用：

df[(some condition) conditional operator (some conditions)]

这不会：

df[some condition conditional-operator some condition]

1赞 Muhammad Yasirroni 10/24/2020 #9

我将尝试给出三种最常见方式的基准（上面也提到过）：

from timeit import repeat

setup = """
import numpy as np;
import random;
x = np.linspace(0,100);
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) * (x <= ub)]', 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'

for _ in range(3):
    for stmt in stmts:
        t = min(repeat(stmt, setup, number=100_000))
        print('%.4f' % t, stmt)
    print()

结果：

0.4808 x[(x > lb) * (x <= ub)]
0.4726 x[(x > lb) & (x <= ub)]
0.4904 x[np.logical_and(x > lb, x <= ub)]

0.4725 x[(x > lb) * (x <= ub)]
0.4806 x[(x > lb) & (x <= ub)]
0.5002 x[np.logical_and(x > lb, x <= ub)]

0.4781 x[(x > lb) * (x <= ub)]
0.4336 x[(x > lb) & (x <= ub)]
0.4974 x[np.logical_and(x > lb, x <= ub)]

但是，在 Panda 系列中不受支持，并且 NumPy Array 比 pandas 数据帧快（大约慢 1000 倍，见数字）：*

from timeit import repeat

setup = """
import numpy as np;
import random;
import pandas as pd;
x = pd.DataFrame(np.linspace(0,100));
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
"""
stmts = 'x[(x > lb) & (x <= ub)]', 'x[np.logical_and(x > lb, x <= ub)]'

for _ in range(3):
    for stmt in stmts:
        t = min(repeat(stmt, setup, number=100))
        print('%.4f' % t, stmt)
    print()

结果：

0.1964 x[(x > lb) & (x <= ub)]
0.1992 x[np.logical_and(x > lb, x <= ub)]

0.2018 x[(x > lb) & (x <= ub)]
0.1838 x[np.logical_and(x > lb, x <= ub)]

0.1871 x[(x > lb) & (x <= ub)]
0.1883 x[np.logical_and(x > lb, x <= ub)]

注意：添加一行代码大约需要 20 μs。x = x.to_numpy()

对于那些喜欢的人：%timeit

import numpy as np
import random
lb, ub = np.sort([random.random() * 100, random.random() * 100]).tolist()
lb, ub
x = pd.DataFrame(np.linspace(0,100))

def asterik(x):
    x = x.to_numpy()
    return x[(x > lb) * (x <= ub)]

def and_symbol(x):
    x = x.to_numpy()
    return x[(x > lb) & (x <= ub)]

def numpy_logical(x):
    x = x.to_numpy()
    return x[np.logical_and(x > lb, x <= ub)]

for i in range(3):
    %timeit asterik(x)
    %timeit and_symbol(x)
    %timeit numpy_logical(x)
    print('\n')

结果：

23 µs ± 3.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.6 µs ± 9.53 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
31.3 µs ± 8.9 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


21.4 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.9 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21.7 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


25.1 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
36.8 µs ± 18.3 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.2 µs ± 5.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

3赞 Mehdi Rostami 7/5/2021 #10

我在此命令中出现错误：

if df != '':
    pass

但是当我把它改成这样时，它起作用了：

if df is not '':
    pass

0赞 Peter Mortensen 2/3/2023

这很有趣，但可能是偶然的。解释是什么？

4赞 Humza Sami 9/20/2021 #11

如果您有多个值：

df['col'].all()

如果只是一个值：

df['col'].item()

15赞 Ynjxsjmh 4/21/2022 #12

对于初学者来说，在 Pandas 中制作多个条件时，这是一个非常常见的问题。一般来说，有两种可能的情况会导致此错误：

条件 1：Python 运算符优先级

有一段布尔索引 |索引和选择数据 — pandas 文档对此进行了解释：

另一个常见的操作是使用布尔向量来过滤数据。运算符是：for 、 for 和 for 。这些必须使用括号进行分组。|or&and~not

默认情况下，Python 将计算诸如的表达式，而所需的计算顺序为。df['A'] > 2 & df['B'] < 3df['A'] > (2 & df['B']) < 3(df['A'] > 2) & (df['B'] < 3)

# Wrong
df['col'] < -0.25 | df['col'] > 0.25

# Right
(df['col'] < -0.25) | (df['col'] > 0.25)

有一些可能的方法可以去掉括号，我稍后会介绍这个问题。

条件 2：不正确的运算符/语句

如上引文所述，您需要使用 for 、for 和 for 。|or&and~not

# Wrong
(df['col'] < -0.25) or (df['col'] > 0.25)

# Right
(df['col'] < -0.25) | (df['col'] > 0.25)

另一种可能的情况是在语句中使用布尔级数。if

# Wrong
if pd.Series([True, False]):
    pass

很明显，Python 语句接受类似布尔的表达式，而不是 Pandas Series。你应该使用 pandas。Series.any 或错误消息中列出的方法，以根据需要将 Series 转换为值。if

例如：

# Right
if df['col'].eq(0).all():
    # If you want all column values equal to zero
    print('do something')

# Right
if df['col'].eq(0).any():
    # If you want at least one column value equal to zero
    print('do something')

让我们谈谈在第一种情况下转义括号的方法。

使用 Pandas 数学函数

Pandas 定义了很多数学函数，包括比较，如下所示：
- pandas.Series.lt（） 表示小于;
- pandas.Series.gt（） 表示大于;
- 熊猫。Series.le（） 表示少和等于;
- pandas.Series.ge（） 表示更大和相等;
- pandas.Series.ne（） 表示不相等;
- 熊猫。Series.eq（） 表示相等;
因此，您可以使用
```
df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]

# is equal to

df = df[df['col'].lt(-0.25) | df['col'].gt(0.25)]
```
使用 pandas。Series.between（）

如果要选择两个值之间的行，可以使用：pandas.Series.between
- df['col].between(left, right)等于
  (left <= df['col']) & (df['col'] <= right);
- df['col].between(left, right, inclusive='left)等于
  (left <= df['col']) & (df['col'] < right);
- df['col].between(left, right, inclusive='right')等于
  (left < df['col']) & (df['col'] <= right);
- df['col].between(left, right, inclusive='neither')等于
  (left < df['col']) & (df['col'] < right);
```
df = df[(df['col'] > -0.25) & (df['col'] < 0.25)]

# is equal to

df = df[df['col'].between(-0.25, 0.25, inclusive='neither')]
```
使用 pandas。DataFrame.query（）

前面引用的文档有一章 query（） 方法很好地解释了这一点。

pandas.DataFrame.query()可以帮助您选择带有条件字符串的 DataFrame。在查询字符串中，可以同时使用按位运算符（和）及其布尔表亲（和）。此外，您可以省略括号，但出于可读性原因，我不建议这样做。&|andor
```
df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]

# is equal to

df = df.query('col < -0.25 or col > 0.25')
```
使用 pandas。DataFrame.eval（）

pandas.DataFrame.eval()计算描述对 DataFrame 列的操作的字符串。因此，我们可以使用这种方法来构建我们的多个条件。语法与相同。pandas.DataFrame.query()
```
df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]

# is equal to

df = df[df.eval('col < -0.25 or col > 0.25')]
```
pandas.DataFrame.query()并且可以做比我在这里描述的更多的事情。建议您阅读他们的文档并与他们一起玩乐。pandas.DataFrame.eval()

1赞 Gautam 10/17/2022 #13

我在 Panda 数据帧中工作时遇到了同样的问题。

我使用过： numpy.logical_and：

在这里，我尝试选择 Id 与匹配且degreee_type不匹配的行。41d7853Certification

如下图所示：

display(df_degrees.loc[np.logical_and(df_degrees['person_id'] == '41d7853' , df_degrees['degree_type'] !='Certification')])

如果我尝试编写如下代码：

display(df_degrees.loc[df_degrees['person_id'] == '41d7853' and df_degrees['degree_type'] !='Certification'])

我们将得到错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我已经使用了它对我有用的numpy.logical_and。

0赞 smriti 4/21/2023 #14

就我而言，我遇到了一个类型值错误，因此引发了此错误。确保为比较运算符提供了要比较的相同数据类型元素。

上一个：序列的真值是模棱两可的。使用 a.empty、a.bool（）、a.item（）、a.any（）或 a.all（）

下一个：如何在 Circom 的信号中使用 & （AND）运算符

序列的真值是模棱两可的。使用 a.empty、a.bool（）、a.item（）、a.any（） 或 a.all（）

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

评论

评论

评论

评论

序列的真值是模棱两可的。使用 a.empty、a.bool（）、a.item（）、a.any（）或 a.all（）