提问人:Oriol Gilabert López 提问时间:11/16/2023 最后编辑:Oriol Gilabert López 更新时间:11/17/2023 访问量:45
在 Anaconda 中使用 SnowPark 和 fuzzywuzzy 时出错
Error when using SnowPark and fuzzywuzzy in Anaconda
问:
我正在做一个项目,我需要在 Anaconda 中使用 SnowPark 并合并以下软件包的功能:https://repo.anaconda.com/pkgs/snowflake/。fuzzywuzzy
为了提供上下文,我使用以下代码从 Snowflake 加载了一个虚构的表:
df = snowflake_session.sql(query_sql)
该表的格式如下:
----------------------
| "col1" | "col2" |
----------------------
| 1111111 | 1111112 |
| 2222222 | 2222222 |
| 3333333 | 1333243 |
----------------------
注意:类型为:<snowflake.snowpark.dataframe.DataFrame
df
我做的第一件事是在我的 Anaconda 虚拟环境中安装软件包:
!conda install --name snowflake_env -c https://repo.anaconda.com/pkgs/snowflake fuzzywuzzy
现在,我导入库并在我的函数中使用它,我定义如下:fuzzywuzzy
import fuzzywuzzy as fuzz
@udf(name="fuzzy", is_permanent=False, replace=True, packages=['fuzzywuzzy'])
def fuzzy(x: int, y:int) -> int:
return fuzz.ratio(x, y)
我将该函数应用于 DataFrame 的列:
df.select("col1", "col2", fuzzy("col2", "col2")).show()
但是,在执行此代码时,我收到以下错误:
SnowparkSQLException: (1304): 01b05970-0303-0e9b-0000-77590c4bc49a: 100357 (P0000): Python Interpreter Error:
Traceback (most recent call last):
File "_udf_code.py", line 37, in compute
File "_udf_code.py", line 26, in wrapper
File "C:\Users\es_oriol\AppData\Local\Temp\ipykernel_1234\660708773.py", line 5, in fuzzy
NameError: name 'fuzzywuzzy' is not defined
in function FUZZY with handler compute
这让我相信我的函数中的导入存在问题,对吗?有没有人对我如何解决这个问题有任何想法?fuzzywuzzy
fuzzy
我提前感谢任何帮助!
谢谢!
奥里奥尔
我尝试重新安装软件包,将其注册到snowflake,更改anaconda虚拟环境等。
答:
我遵循的以下步骤对我有用。
我使用 conda 创建了一个虚拟环境,如下所示:
conda create --name snowflake_env --override-channels -c https://repo.anaconda.com/pkgs/snowflake python=3.9 snowflake-snowpark-python fuzzywuzzy
已安装的软件包列表为:
$ pip list
Package Version
-------------------------- ------------
asn1crypto 1.5.1
Brotli 1.0.9
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 2.0.4
cloudpickle 2.0.0
cryptography 41.0.3
filelock 3.9.0
fuzzywuzzy 0.18.0
idna 3.4
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
numpy 1.26.0
oscrypto 1.3.0
packaging 23.1
pip 23.3
platformdirs 3.8.1
pyarrow 10.0.1
pycparser 2.21
pycryptodomex 3.15.0
PyJWT 2.4.0
pyOpenSSL 23.2.0
PySocks 1.7.1
python-Levenshtein 0.12.2
pytz 2023.3.post1
PyYAML 6.0.1
requests 2.31.0
setuptools 68.0.0
snowflake-connector-python 3.2.0
snowflake-snowpark-python 1.9.0
sortedcontainers 2.4.0
tomlkit 0.11.1
typing_extensions 4.7.1
urllib3 1.26.18
wheel 0.41.2
我创建了一个小型 python 脚本来使用与您相同的 udf:
$ cat stackoverflow.py
from snowflake.snowpark import Session
from snowflake.snowpark import Table
from snowflake.snowpark.functions import col
from snowflake.snowpark.functions import udf
from fuzzywuzzy import fuzz
connection_parameters = {
"account": "XXXX",
"user": "XXXX",
"password": "XXXX",
"role": "XXXX",
"warehouse": "XXXX",
"database": "XXXX",
"schema": "public"
}
session = Session.builder.configs(connection_parameters).create()
df = session.sql("SELECT * FROM test_fuzz")
#df = session.table('test_fuzz')
@udf(name="fuzzy", is_permanent=False, replace=True, packages=['fuzzywuzzy'])
def fuzzy(x: int, y:int) -> int:
return fuzz.ratio(x, y)
df.select("col1", "col2", fuzzy("col2", "col2")).show()
session.close()
我运行它:
$ python stackoverflow.py
---------------------------------------------------
|"COL1" |"COL2" |"FUZZY(""COL2"", ""COL2"")" |
---------------------------------------------------
|1111111 |1111112 |100 |
|2222222 |2222222 |100 |
|3333333 |1333243 |100 |
---------------------------------------------------
现在,当我通过Snowflake UI签入查询历史记录时,我可以看到创建了一个临时函数,我可以看到以下部分:
# The following comment contains the source code generated by snowpark-python for explanatory purposes.
# import fuzzywuzzy.fuzz as fuzz
# @udf(name="fuzzy", is_permanent=False, replace=True, packages=['fuzzywuzzy'])
# def fuzzy(x: int, y:int) -> int:
# return fuzz.ratio(x, y)
#
# func = fuzzy
在您的情况下,上面的代码缺少导入部分:
# import fuzzywuzzy.fuzz as fuzz
我只能想到与你当地环境有关的东西。顺便说一句,我已经在 Ubuntu 22.04 和 Windows 10 上对此进行了测试,步骤完全相同,并且在两者上对我来说都很好。
评论
上一个:删除包文件夹时出错
评论
from fuzzywuzzy import fuzz
>>> import fuzzywuzzy as fuzz >>> fuzz.ratio('123', '1234') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'fuzzywuzzy' has no attribute 'ratio' >>>
python from fuzzywuzzy import fuzz @udf(name="fuzzy", is_permanent=False, replace=True, packages=['fuzzywuzzy']) def fuzzy(x: int, y:int) -> int: return fuzz.ratio(x, y)
python SnowparkSQLException: (1304): blah blah blah File "C:\Users\es_oriol\AppData\Local\Temp\ipykernel_1234\660708773.py", line 5, in fuzzy" NameError: name 'fuzzywuzzy' is not defined in function FUZZY with handler compute
def fuzzy(x: int, y:int) -> int: from fuzzywuzzy import fuzz return fuzz.ratio(x, y)
python @udf(name="fuzzy", is_permanent=False, replace=True, packages=['fuzzywuzzy']) def fuzzy(x: int, y:int) -> int: from fuzzywuzzy import fuzz return fuzz.ratio(x, y)