提问人:Mika Bell 提问时间:10/17/2023 最后编辑:Mika Bell 更新时间:10/31/2023 访问量:36
如何将布尔类型的 dask 标量转换为布尔表达式
How to turn a dask scalar of type boolean into a boolean expression
问:
我有一个包含许多数据操作的长代码,其中最后我想通过比较两个 dask 系列来获得一个布尔表达式,这是我代码的最后一部分:
scores_for_test_data[f"Healthy Sample Score {exclude_name}"] = np.power(
(pairwise_test_excluded_sample - healthy_sample_mean_ratio), 2)
scores_for_test_data[f"Lung Sample Score {exclude_name}"] = np.power(
(pairwise_test_excluded_sample - lung_sample_mean_ratio), 2)
scores_for_test_data[f"Sum Healthy Sample Score {exclude_name}"] = scores_for_test_data[
f"Healthy Sample Score {exclude_name}"].sum()
scores_for_test_data[f"Sum Lung Sample Score {exclude_name}"] = scores_for_test_data[
f"Lung Sample Score {exclude_name}"].sum()
res = scores_for_test_data[f"Sum Healthy Sample Score {exclude_name}"][0] < \
scores_for_test_data[f"Sum Lung Sample Score {exclude_name}"][0]
if res.any().compute():
check += 1
(所有变量都是 dask 对象)
我的问题是我正在处理一个非常大的数据集,所以当我到达 compute() 时,它会产生一个内存错误,res 应该是 True/False,但它的类型是“dask.dataframe.core.Series”
如果可能,如何从此比较中获得布尔表达式?
scores_for_test_data[f“Sum Healthy Sample Score {exclude_name}”] 和 scores_for_test_data[f“Sum Lung Sample Score {exclude_name}”] 都是 dask 系列,其所有行的值相同,例如:
0 2375.861075
1 2375.861075
2 2375.861075
3 2375.861075
4 2375.861075
5 2375.861075
这就是为什么我只从比较中的每一行中取第一行。
这是我得到的:
2023-10-22 14:12:56,258 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.70 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:56,716 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 6.40 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:56,985 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.75 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,109 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.65 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,139 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.69 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,329 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.72 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,329 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.74 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,500 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.47 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,505 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:55007 (pid=14100) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:57,613 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 6.39 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,681 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.49 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,686 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.69 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,784 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.78 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,872 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.52 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:57,877 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.47 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:58,034 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:12:58,218 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.48 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:58,276 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:54993 (pid=3008) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,357 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 6.52 GiB -- Worker memory limit: 7.98 GiB
2023-10-22 14:12:58,405 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:54986 (pid=9212) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,443 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:55004 (pid=4788) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,568 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:55001 (pid=7332) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,682 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:54989 (pid=12900) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,956 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:54992 (pid=8280) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:58,983 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:12:59,139 - distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:54998 (pid=3868) exceeded 95% memory budget. Restarting...
2023-10-22 14:12:59,239 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:12:59,348 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:12:59,515 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:12:59,718 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:13:00,020 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:13:00,363 - distributed.nanny - WARNING - Restarting worker
2023-10-22 14:13:19,717 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 5.78 GiB -- Worker memory limit: 7.98 GiB
请有人帮忙
答: 暂无答案
评论