Snakemake 在规则中使用不同的通配符-解网

问：

我正在尝试创建 snakemake 规则，该规则接受输入我的 fastq 文件并在输出中为每个文件返回一个 .sam 文件。fastq

我有一个这样的文件：

FILE    TYPE    SM    LB    ID    PU          PL
xfgh.fastq.gz  Single      IND1  IND1  IND1  Platform    Illumina
IND2.fastq.gz     Single  IND2  IND2  IND2  Platform    Illumina
zfgv.fastq.gz  Single      IND3  IND3  IND3  Platform    Illumina 
IND4_P1.fastq.gz  Single      IND4  IND4  IND4  Platform    Illumina

所以我做了类似的事情。
我用熊猫打开我的数据帧：

pd.read_csv("info_file.txt")我库存了一个列文件SM和ID

我创建我的规则：

rule all:
    input:
        sam_file = expand("ALIGNEMENT/{sm}/{id}.sam", sm = info_df["SM"], id = info_df["ID"])

rule alignement:
    input:
          fastq_files = "PATH/TO/{fastq}"
    output:
          sam_file = "ALIGNEMENT/{sm}/{id}.sam"

我知道输入和输出需要具有相同的通配符，但是是否存在一种方法可以从file.txt的“FILES”列中输入，并在输出中输入这样的路径：其中 {sm} 和 {id} 是我file.txt的 SM 和 ID 列"ALIGNEMENT/{sm}/{id}.sam"

我还想为每个文件启动一个规则。

如果有人能帮我，谢谢

python-3.x 通配符 snakemake fastq

更新：尽管@dariober提供了一个解决问题的有效答案，但我坚持认为在这种情况下没有必要使用 Snakemake。Snakemake 提供的所有好处都丢失了，同时存在不必要的复杂性。作为替代解决方案，我提供了一个简单的单一规则管道，没有讨厌的 lambda。

rule keep_it_simple_stupid:
    script:
        with open("info_file.txt") as f:
            next(f)
            for row in f:
                filename, _, sm, _, id, _, _ = row.split('\t')
                shell(f"do whatever you want with {filename}, {sm}, and {id}")

此规则的主体是纯准系统 Python。如果它是 Snakemake 更惯用的更大管道的一部分，那么将其包装到 Snakemake 规则中可能是有意义的。但如果不是 - 应避免将 Snakemake 作为完成任务的错误工具。

import pandas as pd

info_df = pd.read_csv("info_file.txt", sep='\t') 

rule all:
    input:
        expand("ALIGNEMENT/{sm}/{id}.sam", zip, sm = info_df["SM"], id = info_df["ID"])

rule alignement:
    input:
        fastq_files=lambda wc: info_df[info_df['ID'] == wc.id]['FILE'],
    output:
        sam_file = "ALIGNEMENT/{sm}/{id}.sam"
    shell:
        r"""
        echo {input.fastq_files} > {output.sam_file}
        """

上一个：运行用于 fastqc 分析的多个 fastq 文件

下一个：为什么我从 f 字符串中得到“无效语法”？[复制]

Snakemake 在规则中使用不同的通配符

Snakemake use different wildcards in a rule

评论

评论