提问人:JL Peyret 提问时间:6/1/2023 更新时间:6/1/2023 访问量:677
如何在 Polars .when 条件上应用和/或布尔逻辑?
How do I apply and/or boolean logic on Polars .when conditionals?
问:
让我们从我的数据帧开始。它有 2 列和 .当不是空且不是时,我想设置。src
tgt
tgt
src
"?"
tgt=src
┌─────┬──────┐
│ tgt ┆ src │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪══════╡
│ a ┆ !a │
│ ? ┆ b │
│ ? ┆ null │
└─────┴──────┘
然后应该给,别名为 newtgt
┌─────┬──────┬────────┐
│ tgt ┆ src ┆ newtgt │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪══════╪════════╡
│ a ┆ !a ┆ a │
│ ? ┆ b ┆ b │
│ ? ┆ null ┆ ? │
└─────┴──────┴────────┘
我可以检查 not null 并且我可以检查.如何组合它们?我试过了,但都没有奏效。== "?"
and
&
&&
到目前为止,我所拥有的,包括错误消息:
import polars as pl
df = pl.from_dict(
dict(tgt=["a","?","?"],src=["!a","b",None])
)
print("\ndf before:\n",df)
df2 = df.with_columns(
pl.when(pl.col("src").is_not_null())
.then(pl.col("src"))
.otherwise(pl.col("tgt"))
.alias("newtgt")
)
print("\ndf2 check if src not null:\n",df2)
df2 = df.with_columns(
pl.when(pl.col("tgt") == "?")
.then(pl.col("src"))
.otherwise(pl.col("tgt"))
.alias("newtgt")
)
print("\ndf2 if check tgt already known:\n",df2)
try:
print("\n\ncheck both with `and`: ")
df2 = df.with_columns(
pl.when(pl.col("tgt") == "?" and pl.col("src").is_not_null())
.then(pl.col("src"))
.otherwise(pl.col("tgt"))
.alias("newtgt")
)
except (ValueError,) as e:
print("\nnot happy with `and`:\n ", e)
try:
print("\n\ncheck both with `&`: ")
df2 = df.with_columns(
pl.when(pl.col("tgt") == "?" & pl.col("src").is_not_null())
.then(pl.col("src"))
.otherwise(pl.col("tgt"))
.alias("newtgt")
)
except (pl.exceptions.InvalidOperationError,) as e:
print("\nnot happy with `&`:\n ", e)
输出:
df before:
shape: (3, 2)
┌─────┬──────┐
│ tgt ┆ src │
│ --- ┆ --- │
│ str ┆ str │
╞═════╪══════╡
│ a ┆ !a │
│ ? ┆ b │
│ ? ┆ null │
└─────┴──────┘
df2 check if src not null:
shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src ┆ newtgt │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪══════╪════════╡
│ a ┆ !a ┆ !a │
│ ? ┆ b ┆ b │
│ ? ┆ null ┆ ? │
└─────┴──────┴────────┘
df2 if check tgt already known:
shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src ┆ newtgt │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪══════╪════════╡
│ a ┆ !a ┆ a │
│ ? ┆ b ┆ b │
│ ? ┆ null ┆ null │
└─────┴──────┴────────┘
check both with `and`:
not happy with `and`:
Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.
check both with `&`:
not happy with `&`:
`bitand` operation not supported for dtype `str`
答:
1赞
Wayoshi
6/1/2023
#1
您需要将复杂和/或表达式的每个部分括起来,以避免这种模棱两可的错误。正如该错误消息所暗示的那样,也是必需的:polars
&
and
df.with_columns(
pl.when((pl.col("tgt") == "?") & (pl.col("src").is_not_null()))
.then(pl.col("src"))
.otherwise(pl.col("tgt"))
.alias("newtgt")
)
shape: (3, 3)
┌─────┬──────┬────────┐
│ tgt ┆ src ┆ newtgt │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str │
╞═════╪══════╪════════╡
│ a ┆ !a ┆ a │
│ ? ┆ b ┆ b │
│ ? ┆ null ┆ ? │
└─────┴──────┴────────┘
另一个等效选项是pl.all(expr1, expr2, ...)
评论
1赞
Dean MacGregor
6/1/2023
吹毛求疵。需要括号的不是极地,而是 python。Python 的运算顺序将运算符放在 所以,如果您键入的是 Python 将要解释的方式,则&
==
pl.col("tgt") == "?" & pl.col("src").is_not_null()
pl.col("tgt") == ("?" & pl.col("src").is_not_null())
0赞
Dean MacGregor
6/1/2023
例如,如果你这样做,你会得到 False,但 True。在第一种情况下,它首先计算为 2,然后检查当然是 False2==2 & 3==3
(2==2) & (3==3)
2&3
2==2==3
评论