提问人:moltke_colombia 提问时间:10/30/2023 最后编辑:Savirmoltke_colombia 更新时间:10/31/2023 访问量:68
Airflow:rpy2 的包安装以在 Airflow 中执行 RScript
Airflow: Package installation of rpy2 to execute RScripts in Airflow
问:
要求:为了能够安装 rpy2 库,因为要用 airflow 编排的代码广泛使用它
当前 Dockerfile
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-base-core r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev&& \
rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==49.6.0
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
问题:我遇到了有关必要库的问题,这些问题没有出现在代码本身中。
错误:
[6/8] RUN python3 -m pip install rpy2:
1.176 Collecting rpy2
1.304 Downloading rpy2-3.5.14.tar.gz (219 kB)
1.422 Installing build dependencies: started
4.186 Installing build dependencies: finished with status 'done'
4.187 Getting requirements to build wheel: started
4.225 Getting requirements to build wheel: finished with status 'error'
4.225 ERROR: Command errored out with exit status 1:
4.225 command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpff4u1mul
4.225 cwd: /tmp/pip-install-12iwr626/rpy2
4.225 Complete output (31 lines):
4.225 Traceback (most recent call last):
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 257, in <module>
4.225 main()
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
4.225 json_out['return_val'] = hook(**hook_input['kwargs'])
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 85, in get_requires_for_build_wheel
4.225 backend = _build_backend()
4.225 File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pep517/_in_process.py", line 63, in _build_backend
4.225 obj = import_module(mod_path)
4.225 File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
4.225 return _bootstrap._gcd_import(name[level:], package, level)
4.225 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
4.225 File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
4.225 File "<frozen importlib._bootstrap_external>", line 883, in exec_module
4.225 File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
4.225 File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 10, in <module>
4.225 import distutils.core
4.225 File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
4.225 File "<frozen importlib._bootstrap>", line 1002, in _find_and_load_unlocked
4.225 File "<frozen importlib._bootstrap>", line 945, in _find_spec
4.225 File "/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py", line 72, in find_spec
4.225 return self.get_distutils_spec()
4.225 File "/usr/local/lib/python3.10/dist-packages/_distutils_hack/__init__.py", line 77, in get_distutils_spec
4.225 class DistutilsLoader(importlib.util.abc.Loader):
4.225 AttributeError: module 'importlib.util' has no attribute 'abc'
答:
所有这些错误往往是不同包版本相互争斗的问题。例如:一个包在其最新版本中删除了一个方法或移动了一些函数,而另一个依赖于前者的包(还)不知道这些更改。
例如:包 A 使用包 B 的方法,但包 B 的开发人员将其重命名为 .如果您有最新版本的 B,但旧版本的 A 尚不知道重命名...井。。。它会崩溃(正如你所看到的).do_something
.do_something_better
这似乎是 Python 3.10 和 setuptools 经常发生的事情。
TL;DR:你看到了一个非常普遍(而且很讨厌)的版本控制问题。
也就是说,这个 Dockerfile 正在成功构建:
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
r-base r-base-core r-cran-randomforest \
libinput-dev libgbm-dev liblzma-dev libbz2-dev libicu-dev libblas-dev liblapack-dev \
python3.6 python3-pip python3-setuptools python3-dev&& \
rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip wheel setuptools>51
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
请注意,需要一堆包,并且我允许 pip、wheel 和 setuptools 在版本方面更加松散。另外,由于我没有你的文件,我不得不把它留空。-dev
requirements.txt
但是:您正在获取 Ubuntu 映像。截至 2023 年 10 月,这意味着安装 Ubuntu 22.04(代号“Jammy Jellyfish”)。该映像中的默认 Python 3 应为 3.10,但您似乎正在安装 Python 3.6。这可能会导致潜在的问题,因为如果你做了一些,你可能会在你的系统中得到Python 3.6,但一个用于Python 3.10的版本,这并不是很好。:latest
apt-get install some_python_package
some_python_package
如果你更愿意使用 Python 3.6,我可以建议你把你的 Dockerfile 建立在 Python Docker 镜像之一上吗?
例如,python:3.6.14-bullseye
,它基于 Debian(不是 Ubuntu),但包含一些调整和环境变量,旨在为 Python 3.6 提供安全的环境(或“生态系统”)
FROM python:3.6.14-bullseye
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
r-base r-base-core r-cran-randomforest \
libinput-dev libgbm-dev liblzma-dev libbz2-dev libicu-dev libblas-dev liblapack-dev \
&& rm -r /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install --upgrade pip wheel setuptools
RUN python3 -m pip install rpy2
RUN Rscript -e "install.packages('data.table')"
COPY . /app
还有更多的 Python Docker 镜像,其功能和内容略有不同。您可能想看看这篇文章,看看哪一个最适合您的需求。
将映像固定到特定版本,而不是固定到特定版本还有一个好处,即如果(例如)Ubuntu Docker 映像维护者决定将“最新”的含义从当前的 22.04 更新到(比如 24.04),您就不会被意外的完整操作系统升级所困扰。:latest
评论
ld: can not find lbz2
-dev
-dev
apt-get install...
评论
setuptools
50.0.2
49.6.0