提问人:hepcat72 提问时间:6/16/2023 更新时间:6/18/2023 访问量:97
为什么这个 snakemake 脚本指令不起作用
Why doesn't this snakemake script directive work
问:
我正在创建一个 snakemake 工作流程。直到昨天,我从未使用过 run 或 script 指令。我有一个运行指令,效果很好,但抱怨它太长了,并说我应该创建一个单独的脚本并使用脚本指令:snakemake --lint
Lints for rule check_peak_read_len_overlap_params (line 90, /Users/rleach/PROJECT-local/ATACC/REPOS/ATACCompendium/workflow/rules/error_checks.smk):
* Migrate long run directives into scripts or notebooks:
Long run directives hamper workflow readability. Use the script or notebook directive instead. Note that the script or notebook directive does not involve boilerplate. Similar to run, you will have direct access to params, input, output, and wildcards.Only use the run directive for a
handful of lines.
Also see:
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#external-scripts
https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#jupyter-notebook-integration
所以我试着转这个:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
min_peak_width=MAX_ARTIFACT_WIDTH + 1,
summit_width=SUMMIT_FLANK_SIZE * 2,
log:
tail="results/QC/logs/check_peak_read_len_overlap_params.stderr",
run:
max_read_len = 0
# Get the max read length from the input file
infile = open(input[0])
for line in infile:
max_read_len = int(line.strip())
# The file should only be one line
break
infile.close()
max_peak_frac = params.min_peak_width / max_read_len
max_summit_frac = params.summit_width / max_read_len
if max_peak_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
"mapped to the smallest allowed peak (of length "
f"{params.min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
f"of {MAX_ARTIFACT_WIDTH}), would never be counted using a "
f"FRAC_READ_OVERLAP of {params.frac_read_overlap}."
)
if max_summit_frac < params.frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
f"mapped to any summit (of length {params.summit_width}), "
"would never be counted using a FRAC_READ_OVERLAP of "
f"{params.frac_read_overlap}."
)
with open(output[0], "w") as out:
out.write("Parameters have validated successfully.")
进入这个:
rule check_peak_read_len_overlap_params:
input:
"results/QC/max_read_length.txt",
output:
"results/QC/parameter_validation.txt",
params:
frac_read_overlap=FRAC_READ_OVERLAP,
max_artifact_width=MAX_ARTIFACT_WIDTH,
summit_flank_size=SUMMIT_FLANK_SIZE,
log:
"results/QC/logs/parameter_validation.log",
conda:
"../envs/python3.yml"
script:
"scripts/check_peak_read_len_overlap_params.py"
(注意,我同时更改了参数,因此我可以在脚本中而不是在 params 指令下操作它们。起初,我只是将代码粘贴到一个函数中,然后调用该函数。然后我尝试调整代码以获得某种日志输出。我试着在顶部添加一个shebang。我尝试移植 snakemake(文档没有提到)。我尝试了一堆方法,但似乎没有任何效果。目前,这是我在脚本中的内容:run
import sys
def check_peak_read_len_overlap_params(
infile,
outfile,
log,
frac_read_overlap,
max_artifact_width,
summit_flank_size,
):
log_handle = open(log, "w")
sys.stderr = sys.stdout = log_handle
print(
"Parameters:\n"
f"Input: {infile}\n"
f"Output: {outfile}\n"
f"Log: {log}\n"
f"\tFRAC_READ_OVERLAP = {frac_read_overlap}\n"
f"\tMAX_ARTIFACT_WIDTH = {max_artifact_width}\n"
f"\tSUMMIT_FLANK_SIZE = {summit_flank_size}"
)
min_peak_width = max_artifact_width + 1
summit_width = summit_flank_size * 2
max_read_len = 0
inhandle = open(infile)
for line in inhandle:
max_read_len = int(line.strip())
# The file should only be one line
break
inhandle.close()
max_peak_frac = min_peak_width / max_read_len
max_summit_frac = summit_width / max_read_len
if max_peak_frac < frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
"mapped to the smallest allowed peak (of length "
f"{min_peak_width}, based on a MAX_ARTIFACT_WIDTH "
f"of {max_artifact_width}), would never be counted using a "
f"FRAC_READ_OVERLAP of {frac_read_overlap}."
)
if max_summit_frac < frac_read_overlap:
raise ValueError(
f"There exist reads (of length {max_read_len}) that, if "
f"mapped to any summit (of length {summit_width}), "
"would never be counted using a FRAC_READ_OVERLAP of "
f"{frac_read_overlap}."
)
with open(outfile, "w") as outhandle:
outhandle.write("Parameters have validated successfully.")
check_peak_read_len_overlap_params(
snakemake.input[0],
snakemake.output[0],
snakemake.log[0],
snakemake.params.frac_read_overlap,
snakemake.params.max_artifact_width,
snakemake.params.summit_flank_size,
)
当我在研究这个问题时,我弄清楚了这个问题,但我将继续发布这个问题,然后回答,因为我在文档或通过谷歌搜索找不到任何内容来回答这个问题......
答:
1赞
hepcat72
6/16/2023
#1
当我尝试添加脚本指令时,我从文档中复制了这个^:
script:
"scripts/script.py"
...然后编辑了它。
问题是 docs 示例很糟糕。如果您遵循他们建议的目录结构:
├── .gitignore
├── README.md
├── LICENSE.md
├── workflow
│ ├── rules
| │ ├── module1.smk
| │ └── module2.smk
│ ├── envs
| │ ├── tool1.yaml
| │ └── tool2.yaml
│ ├── scripts
| │ ├── script1.py
| │ └── script2.R
│ ├── notebooks
| │ ├── notebook1.py.ipynb
| │ └── notebook2.r.ipynb
│ ├── report
| │ ├── plot1.rst
| │ └── plot2.rst
| └── Snakefile
├── config
│ ├── config.yaml
│ └── some-sheet.tsv
├── results
└── resources
脚本的路径必须为:
script:
"../scripts/script.py"
一旦我添加了它,它就开始工作了(或者更确切地说,我遇到了下一个错误)。
^ snakemake Version 是我正在运行的版本,但它与最新版本相同。7.25.0
评论
1赞
J_H
6/16/2023
建议您向 snakemake 人员提交文档错误。
1赞
hepcat72
6/16/2023
做。github.com/snakemake/snakemake/issues/2302
评论