在 python 中进行多处理时，ShareableLists 实际上是如何工作的？-解网

问：

我想说的是，一个包含 10 到 20,000 个字典的列表是并行使用的。为此，我使用了多处理。问题是，每个进程都使用了 2-4 GB 的 RAM。经过一番研究，我找到了一个对象，尽管我会在共享相同资源的进程中解决该对象，而不是为自己复制千兆字节的数据。ShareableList

我一直在用以下代码进行测试：

## main.ipynb
# Test shared mem
from multiprocessing import Pool
from tqdm.notebook import tqdm
from multiprocessing.managers import SharedMemoryManager
from utils import mult

with SharedMemoryManager() as smm:
    sl = smm.ShareableList(range(2000))
    with Pool(processes=3) as p:
        with tqdm(total=len(sl), desc="Going through articles") as pbar:
            for res in p.imap_unordered(mult, sl, chunksize=100):
                pbar.update()
total_result = sum(sl)
total_result

## utils.py
def mult(x):
    return x**2

我预计，每个进程只需要大约 100MB，但经过一些测试，即使使用，每个进程都占用 2-3 GB。ShareableList

以下是一些令人尊敬的数字。在 VSCode 中运行此代码时，我得到了以下峰值 RAM 使用率（以 [GB] 为单位）：

空闲：7
1 个进程：10 个
2 进程：12,7
3 进程：15,1

我还尝试了这个答案中的另一种解决方案，结果相同。

注意：我正在导入 mult 函数，因为 Notebook 中的多处理只能以这种方式工作。如果在同一单元中定义了多函数，则进程似乎永远不会启动。

PS：在 VSCode 中的 venv 和 ipynb 文件中使用 Python 3.11.0

python-3.x jupyter-notebook 多处理共享内存

在 python 中进行多处理时，ShareableLists 实际上是如何工作的？

How do ShareableLists actually work when multiprocessing in python?

评论