为什么这个 snakemake 脚本指令不起作用

Why doesn't this snakemake script directive work

提问人:hepcat72 提问时间:6/16/2023 更新时间:6/18/2023 访问量:97

问:

我正在创建一个 snakemake 工作流程。直到昨天,我从未使用过 run 或 script 指令。我有一个运行指令,效果很好,但抱怨它太长了,并说我应该创建一个单独的脚本并使用脚本指令:snakemake --lint

Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
    * Migrate long run directives into scripts or notebooks:
      Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards.Only use the run directive for a
      handful of lines.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
      https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration

所以我试着转这个:

rule check_peak_read_len_overlap_params:
    input:
        "results/QC/max_read_length.txt",
    output:
        "results/QC/parameter_validation.txt",
    params:
        frac_read_overlap=FRAC_READ_OVERLAP,
        min_peak_width=MAX_ARTIFACT_WIDTH + 1,
        summit_width=SUMMIT_FLANK_SIZE * 2,
    log:
        tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
    run:
        max_read_len = 0
        # Get the max read length from the input file
        infile = open(input[0])
        for line in infile:
            max_read_len = int(line.strip())
            # The file should only be one line
            break
        infile.close()

        max_peak_frac = params.min_peak_width / max_read_len
        max_summit_frac = params.summit_width / max_read_len

        if max_peak_frac < params.frac_read_overlap:
            raise ValueError(
                f"There exist reads (of length {max_read_len}) that, if "
                "mapped to the smallest allowed peak (of length "
                f"{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
                f"of {MAX_ARTIFACT_WIDTH}), would never be counted using a "
                f"FRAC_READ_OVERLAP of {params.frac_read_overlap}."
            )

        if max_summit_frac < params.frac_read_overlap:
            raise ValueError(
                f"There exist reads (of length {max_read_len}) that, if "
                f"mapped to any summit (of length {params.summit_width}), "
                "would never be counted using a FRAC_READ_OVERLAP of "
                f"{params.frac_read_overlap}."
            )

        with open(output[0], "w") as out:
            out.write("Parameters have validated successfully.")

进入这个:

rule check_peak_read_len_overlap_params:
    input:
        "results/QC/max_read_length.txt",
    output:
        "results/QC/parameter_validation.txt",
    params:
        frac_read_overlap=FRAC_READ_OVERLAP,
        max_artifact_width=MAX_ARTIFACT_WIDTH,
        summit_flank_size=SUMMIT_FLANK_SIZE,
    log:
        "results/QC/logs/parameter_validation.log",
    conda:
        "../envs/python3.yml"
    script:
        "scripts/check_peak_read_len_overlap_params.py"

(注意,我同时更改了参数,因此我可以在脚本中而不是在 params 指令下操作它们。起初,我只是将代码粘贴到一个函数中,然后调用该函数。然后我尝试调整代码以获得某种日志输出。我试着在顶部添加一个shebang。我尝试移植 snakemake(文档没有提到)。我尝试了一堆方法,但似乎没有任何效果。目前,这是我在脚本中的内容:run

import sys


def check_peak_read_len_overlap_params(
    infile,
    outfile,
    log,
    frac_read_overlap,
    max_artifact_width,
    summit_flank_size,
):
    log_handle = open(log, "w")
    sys.stderr = sys.stdout = log_handle

    print(
        "Parameters:\n"
        f"Input: {infile}\n"
        f"Output: {outfile}\n"
        f"Log: {log}\n"
        f"\tFRAC_READ_OVERLAP = {frac_read_overlap}\n"
        f"\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n"
        f"\tSUMMIT_FLANK_SIZE = {summit_flank_size}"
    )

    min_peak_width = max_artifact_width + 1
    summit_width = summit_flank_size * 2
    max_read_len = 0
    inhandle = open(infile)
    for line in inhandle:
        max_read_len = int(line.strip())
        # The file should only be one line
        break
    inhandle.close()

    max_peak_frac = min_peak_width / max_read_len
    max_summit_frac = summit_width / max_read_len

    if max_peak_frac < frac_read_overlap:
        raise ValueError(
            f"There exist reads (of length {max_read_len}) that, if "
            "mapped to the smallest allowed peak (of length "
            f"{min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
            f"of {max_artifact_width}), would never be counted using a "
            f"FRAC_READ_OVERLAP of {frac_read_overlap}."
        )

    if max_summit_frac < frac_read_overlap:
        raise ValueError(
            f"There exist reads (of length {max_read_len}) that, if "
            f"mapped to any summit (of length {summit_width}), "
            "would never be counted using a FRAC_READ_OVERLAP of "
            f"{frac_read_overlap}."
        )

    with open(outfile, "w") as outhandle:
        outhandle.write("Parameters have validated successfully.")

check_peak_read_len_overlap_params(
    snakemake.input[0],
    snakemake.output[0],
    snakemake.log[0],
    snakemake.params.frac_read_overlap,
    snakemake.params.max_artifact_width,
    snakemake.params.summit_flank_size,
)

当我在研究这个问题时,我弄清楚了这个问题,但我将继续发布这个问题,然后回答,因为我在文档或通过谷歌搜索找不到任何内容来回答这个问题......

Python 指令 snakemake

评论

0赞 hepcat72 6/16/2023
当我尝试编辑问题时,我遇到错误,所以我将粘贴我忘记包含在此处的内容:每当我运行工作流时,它都会告诉我该规则以非零状态退出,但日志文件为空,并且未创建输出文件。我所做的一切似乎都没有改变这一点。

答:

1赞 hepcat72 6/16/2023 #1

当我尝试添加脚本指令时,我从文档中复制了这个^:

    script:
        "scripts/script.py"

...然后编辑了它。

问题是 docs 示例很糟糕。如果您遵循他们建议的目录结构

├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│   ├── rules
|   │   ├── module1.smk
|   │   └── module2.smk
│   ├── envs
|   │   ├── tool1.yaml
|   │   └── tool2.yaml
│   ├── scripts
|   │   ├── script1.py
|   │   └── script2.R
│   ├── notebooks
|   │   ├── notebook1.py.ipynb
|   │   └── notebook2.r.ipynb
│   ├── report
|   │   ├── plot1.rst
|   │   └── plot2.rst
|   └── Snakefile
├── config
│   ├── config.yaml
│   └── some-sheet.tsv
├── results
└── resources

脚本的路径必须为:

    script:
        "../scripts/script.py"

一旦我添加了它,它就开始工作了(或者更确切地说,我遇到了下一个错误)。

^ snakemake Version 是我正在运行的版本,但它与最新版本相同。7.25.0

评论

1赞 J_H 6/16/2023
建议您向 snakemake 人员提交文档错误。
1赞 hepcat72 6/16/2023
做。github.com/snakemake/snakemake/issues/2302