“同时读 -r 行;do“无法识别计数器变量-解网

问：

我有可用于处理每一行文件的代码，但是当我尝试在 while 循环中使用计数器变量将迭代限制为整数“n”时，它不再有效。

这是我的代码：


# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out

# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  counter=0

  # Process each line of the file

  while read -r line; do

  # Extract the 6th field from the line

    field=$(echo "$line" | awk '{print $6}')

    # Check if the field is a biallelic SNP

    if [[ $(is_biallelic "$field") -eq 1 ]]; then

    # Append the line to the output file


    echo "$line" >> "$output_file"

    fi

  done < "$file"

done

这可以正常工作并按预期工作，并使以下输出：

[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30607289 rs142580331 1 49109
6 30656398 rs2249059 6 30607189 rs113520162 1 49209
6 30656398 rs2249059 6 30607173 rs111808357 1 49225
6 30656398 rs2249059 6 30606141 rs112927484 1 50257
6 30656398 rs2249059 6 30604733 rs147842052 1 51665
...

（此文件中有 49 行）

我的问题是我希望它只将每个文件打印“n”行，该文件在字段 6 上拥有双等位基因 SNP 到我的输出文件。我将代码修改为：

n=4

snp_db_file=/project/richards/ethan.kreuzer/snp156.db

# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out

# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  counter=0

  # Process each line of the file
  while read -r line; do
    # Extract the 6th field from the line
    field=$(echo "$line" | awk '{print $6}')

    # Check if the field is a biallelic SNP
    if [[ $(is_biallelic "$field") -eq 1 ]]; then
      # Append the line to the output file
      echo "$line" >> "$output_file"
      ((counter++))
      if ((counter >= n)); then
        break  # Break the inner loop after n iterations
      fi
    fi

  done < "$file"

done

但现在我得到了：

[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563

这似乎是基本代码，所以我真的不确定我做错了什么。

Bash While-Loop 生物信息学遗传学

为每行输入运行单独的 awk 副本是非常低效的;最好为整个输入文件运行一个 awk 副本。如果有必要，您仍然可以在直接并行读取文件时执行此操作，但更好的方法是使用 bash 内置函数将文件的列拆分为字段，而不是在如此琐碎的用例中根本不使用 awk。（在很多地方，awk 是工作的正确工具，但将字段拆分为单独的变量是 shell 内置可以自己做的事情，而且您已经在使用任何方式了）。readread

0赞 Charles Duffy 6/15/2023

不过，实际上，如果您遵循用于跟踪执行的建议，您将获得更好的答案。在我看到那个痕迹后，我会更好地了解发生了什么。set -x

1赞 pjh 6/15/2023

最好的猜测是代码在（或等效）的情况下运行。当计算值为零时，给出非零状态，因此程序会静默停止。请参阅执行数值计算时退出 Bash。这是使用 .参见 BashFAQ/105（为什么不设置 -e（或设置 -o errexit，或陷阱 ERR）做我所期望的？set -o errexitset -e((counter++))counterset -o errexit

1赞 markp-fuso 6/15/2023

fwiw，消除对偶子进程（），从而消除对的不必要调用，用field=$(echo "$line" | awk '{print $6}')awkread -r f1 f2 f3 f4 f5 field frest <<< "$line"

答：

0赞 chepner 6/14/2023 #1

你不需要计数器。让循环输出所有行，并仅将其中第一行输出到输出文件。退出时，循环也将在第一次尝试将行写入现在关闭的管道时。whileheadnhead

for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  # Process each line of the file
  while read -r line; do
    field=$(echo "$line" | awk '{print $6}')    
    [[ $(is_biallelic "$field") -eq 1 ]] &&  echo "$line"
  done < "$file" | head -n "$n" >> "$output_file"
done

检查是否可以使用的退出状态而不是其输出来确定是否输出，这样就可以编写类似的内容is_biallelic$line

is_biallelic "$field" && echo "$line"

整个循环也可能被替换为一个可以根据需要调用的脚本，而不是仅仅为了提取一个字段而在每一行上运行。它可以像whileawkis_biallelicawk

awk 'system("is_biallelic $6")' "$file" >> "$output_file"

“同时读 -r 行;do“无法识别计数器变量

"while read -r line; do" not recognizing counter variable

评论

评论