如何使用 tqdm/python 拥有一个包含多个下载的进度条？-解网

问：

这是我的工作 python 脚本，用于从 UniProt 下载 fasta 序列（非常感谢社区）。 '''

UniProt fasta downloader using accession ids from a text file,
show the download progress for each downloading sequence,
and make a list of unaccessible sequnces
'''
import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm
#Part I: Read the file with IDs and make a list of urls to download the respective sequences
with open ('errtest.txt', 'r') as infile:
    lines = infile.readlines()

listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]

downloaded = 0 #sequences downloaded

URL_list = []
for line in lines:
    access_id = line.strip()
    url_part1 = 'https://rest.uniprot.org/uniprotkb/'
    url_part2 = '.fasta'
    URL = url_part1+access_id+url_part2          
    URL_list.append(URL)

not_found = []
for url in URL_list:
    r = requests.get(url, stream=True, allow_redirects=True)
    file_size = int(r.headers.get('Content-Length', 0))
    if r.status_code != 200:
        Apart = url.removeprefix('https://rest.uniprot.org/uniprotkb/')
        short_id = Apart.removesuffix('.fasta')
        not_found.append (short_id)
        print (short_id, '-- not found')
    elif r.status_code == 200:
        path = pathlib.Path((file_name)+'seqs.fa').expanduser().resolve()
        path.parent.mkdir(parents=True, exist_ok=True)

        desc = "(Unknown total file size)" if file_size == 0 else ""
        r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
        with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
            with path.open("ab") as f:
                shutil.copyfileobj(r_raw, f)
        downloaded += 1
print ('Sequences with these accesion ids were not found:\n', not_found)
print (downloaded, 'sequences downloaded')

这些是 errtest.txt 文件的内容（一些错误的 ID 要计数，一些正确的 ID）：

wrong1
D3VN13
B9W4V6
wrong2
A0A8S0XZH6
wrong3

这是典型的输出：

wrong1 -- not found

  0%|          | 0/477 [00:00<?, ?it/s]
100%|██████████| 477/477 [00:00<00:00, 239kB/s]

  0%|          | 0/473 [00:00<?, ?it/s]
100%|██████████| 473/473 [00:00<00:00, 42.4kB/s]
wrong2 -- not found

  0%|          | 0/534 [00:00<?, ?it/s]
100%|██████████| 534/534 [00:00<00:00, 268kB/s]
wrong3 -- not found
Sequences with these accesion ids were not found:
 ['wrong1', 'wrong2', 'wrong3']
3 sequences downloaded

目前为止，一切都好。接下来，我想为所有下载制作一个进度条。在这个文本文件中，只有 3 个合法 ID 和 3 个错误的 ID（有时会发生这种情况），并且可以一个接一个地显示三个进度条。但实际上，列表文件中将有数千个 ID，有 1000 个或 URL 以及相应的序列下载。因此，最好有一个显示下载进度的进度条。

python 下载进度 tqdm

import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm

# Part I: Read the file with IDs and make a list of URLs to download the respective sequences
with open('errtest.txt', 'r') as infile:
    lines = infile.readlines()

listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]

downloaded = 0

URL_list = []
total_file_size = 0  # Initialize total file size
not_found = []

for line in lines:
    access_id = line.strip()
    url_part1 = 'https://rest.uniprot.org/uniprotkb/'
    url_part2 = '.fasta'
    URL = url_part1 + access_id + url_part2
    URL_list.append(URL)
    # classify files
    r = requests.get(URL, stream=True, allow_redirects=True)
    if r.status_code != 200:
        Apart = URL.removeprefix('https://rest.uniprot.org/uniprotkb/')
        short_id = Apart.removesuffix('.fasta')
        not_found.append(short_id)
        print(short_id, '-- not found')
    else:
        file_size = int(r.headers.get('Content-Length', 0))
        total_file_size += file_size  # Add current file size to total file size

# Create unique progress bar
with tqdm(total=total_file_size, unit='B', unit_scale=True, unit_divisor=1024, desc='Downloading') as pbar:
    for URL in URL_list:
        r = requests.get(URL, stream=True, allow_redirects=True)
        if r.status_code == 200:
            path = pathlib.Path((file_name) + 'seqs.fa').expanduser().resolve()
            path.parent.mkdir(parents=True, exist_ok=True)

            r.raw.read = functools.partial(r.raw.read, decode_content=True)
            with path.open("ab") as f:
                shutil.copyfileobj(r.raw, f)
            downloaded += 1
        pbar.update(file_size)

print('Sequences with these accession IDs were not found:\n', not_found)
print(downloaded, 'sequences downloaded')

我在 IDLE 中得到了他的： wrong1 -- not found wrong2 -- not found wrong3 -- not found 下载： 0%| |0.00/1.45 千米赛 [00：00<?, ?B/s] 下载：36%|███▌ |534/1.45k [00：00<00：01， 677B/秒] 下载： 72%|███████▏ |1.04k/1.45k [00：01<00：00， 677B/秒] 下载： 1.56kB [00：02， 698B/s] 下载： 2.09kB [00：03， 665B/s] 下载： 2.61kB [00：04， 609B/s] 下载： 3.13kB [00：05， 603B/s] 下载： 3.13kB [00：05， 586B/s] 未找到具有这些登录 ID 的序列： ['wrong1'， 'wrong2'， 'wrong3'] 下载了 3 个序列

0赞 Irfan 9/7/2023

看起来进度条已更新，但在 IDLE 的下一行显示新的进度。

上一个：macOS 命令行代码在 Swift 中获取数据的现代方式，带有进度通知

下一个：使用 tqdm 从 UniProt 下载文本 - 响应未写入目标文件

如何使用 tqdm/python 拥有一个包含多个下载的进度条？

How to have one progress bar with multiple downloads using tqdm/python?

评论

评论