如何在 snakemake 中访问嵌套列表的每个元素以命名输出?

How to access each element of a nested list for naming output in snakemake?

提问人:recurso 提问时间:10/30/2022 更新时间:10/30/2022 访问量:79

问:

这是一个类似的问题: Snakemake:使用 python 嵌套列表推导进行条件分析

我有以下几点:

RUN_ID = ["run1", "run2"]

SAMPLES = [["A", "B", "C"], ["D","E","F"]]
rule all:
    input:
        summary = expand("foo/{run}/{sample}/outs/{sample}_report.html", run=RUN_ID, sample=SAMPLES)

问题 1:每个运行应仅与相应的样本相关联(基于索引)。所以 run1 只与 A、B、C 配对,run2 只与 D、E、F 配对。RUN_IDSAMPLES

问题 2:每个输出文件的命名应反映这种基于索引的配对。目前,我正在努力让每个嵌套列表的每个元素与每个元素配对SAMPLESRUN_ID

基于上述内容,我想要以下输出:

"foo/run1/A/outs/A_report.html"
"foo/run1/B/outs/B_report.html"
"foo/run1/C/outs/C_report.html"

"foo/run2/D/outs/D_report.html"
"foo/run2/E/outs/E_report.html"
"foo/run2/F/outs/F_report.html"

最初我得到的是这个:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run1/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

"foo/run2/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

我在扩展函数中使用zip克服了不需要的配对:

summary= expand(["foo/{run}/{sample}/outs/{sample}_report.html", "foo/{run}/{sample}/outs/{sample}_report.html"], zip, run=RUN_ID, sample=SAMPLES)

给我留下 RUN_IDSAMPLES 之间所需的配对:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"

"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

但是,如上所示,每个嵌套列表都传递到输出路径中,而不是每个嵌套列表的每个元素。我可以通过将 SAMPLES 分成两个不同的列表来实现我想要的,但想要一种更优雅和自动化的方法。

我也不拘泥于使用嵌套列表;感谢对修复或更好方法的任何见解。谢谢!

python list-comprehension 嵌套列表 snakemake

评论


答:

2赞 SultanOrazbayev 10/30/2022 #1

expand是一个方便的实用程序,对于更复杂的情况,直接使用 Python 生成所需的列表通常更快:

RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]

desired_files = []
for run, SAMPLE in zip(RUN_ID, SAMPLES):
   for sample in SAMPLE:
      file = f"foo/{run}/{sample}/outs/{sample}_report.html"
      desired_files.append(file)
    
rule all:
   input: desired_files