提问人:Irfan 提问时间:9/6/2023 更新时间:9/6/2023 访问量:62
如何使用 tqdm/python 拥有一个包含多个下载的进度条?
How to have one progress bar with multiple downloads using tqdm/python?
问:
这是我的工作 python 脚本,用于从 UniProt 下载 fasta 序列(非常感谢社区)。 '''
UniProt fasta downloader using accession ids from a text file,
show the download progress for each downloading sequence,
and make a list of unaccessible sequnces
'''
import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm
#Part I: Read the file with IDs and make a list of urls to download the respective sequences
with open ('errtest.txt', 'r') as infile:
lines = infile.readlines()
listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]
downloaded = 0 #sequences downloaded
URL_list = []
for line in lines:
access_id = line.strip()
url_part1 = 'https://rest.uniprot.org/uniprotkb/'
url_part2 = '.fasta'
URL = url_part1+access_id+url_part2
URL_list.append(URL)
not_found = []
for url in URL_list:
r = requests.get(url, stream=True, allow_redirects=True)
file_size = int(r.headers.get('Content-Length', 0))
if r.status_code != 200:
Apart = url.removeprefix('https://rest.uniprot.org/uniprotkb/')
short_id = Apart.removesuffix('.fasta')
not_found.append (short_id)
print (short_id, '-- not found')
elif r.status_code == 200:
path = pathlib.Path((file_name)+'seqs.fa').expanduser().resolve()
path.parent.mkdir(parents=True, exist_ok=True)
desc = "(Unknown total file size)" if file_size == 0 else ""
r.raw.read = functools.partial(r.raw.read, decode_content=True) # Decompress if needed
with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
with path.open("ab") as f:
shutil.copyfileobj(r_raw, f)
downloaded += 1
print ('Sequences with these accesion ids were not found:\n', not_found)
print (downloaded, 'sequences downloaded')
这些是 errtest.txt 文件的内容(一些错误的 ID 要计数,一些正确的 ID):
wrong1
D3VN13
B9W4V6
wrong2
A0A8S0XZH6
wrong3
这是典型的输出:
wrong1 -- not found
0%| | 0/477 [00:00<?, ?it/s]
100%|██████████| 477/477 [00:00<00:00, 239kB/s]
0%| | 0/473 [00:00<?, ?it/s]
100%|██████████| 473/473 [00:00<00:00, 42.4kB/s]
wrong2 -- not found
0%| | 0/534 [00:00<?, ?it/s]
100%|██████████| 534/534 [00:00<00:00, 268kB/s]
wrong3 -- not found
Sequences with these accesion ids were not found:
['wrong1', 'wrong2', 'wrong3']
3 sequences downloaded
目前为止,一切都好。接下来,我想为所有下载制作一个进度条。在这个文本文件中,只有 3 个合法 ID 和 3 个错误的 ID(有时会发生这种情况),并且可以一个接一个地显示三个进度条。但实际上,列表文件中将有数千个 ID,有 1000 个或 URL 以及相应的序列下载。因此,最好有一个显示下载进度的进度条。
答:
0赞
curt
9/6/2023
#1
我认为您可以在开始下载循环之前计算总大小,然后使用唯一的进度条,如下所示:
import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm
# Part I: Read the file with IDs and make a list of URLs to download the respective sequences
with open('errtest.txt', 'r') as infile:
lines = infile.readlines()
listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]
downloaded = 0
URL_list = []
total_file_size = 0 # Initialize total file size
not_found = []
for line in lines:
access_id = line.strip()
url_part1 = 'https://rest.uniprot.org/uniprotkb/'
url_part2 = '.fasta'
URL = url_part1 + access_id + url_part2
URL_list.append(URL)
# classify files
r = requests.get(URL, stream=True, allow_redirects=True)
if r.status_code != 200:
Apart = URL.removeprefix('https://rest.uniprot.org/uniprotkb/')
short_id = Apart.removesuffix('.fasta')
not_found.append(short_id)
print(short_id, '-- not found')
else:
file_size = int(r.headers.get('Content-Length', 0))
total_file_size += file_size # Add current file size to total file size
# Create unique progress bar
with tqdm(total=total_file_size, unit='B', unit_scale=True, unit_divisor=1024, desc='Downloading') as pbar:
for URL in URL_list:
r = requests.get(URL, stream=True, allow_redirects=True)
if r.status_code == 200:
path = pathlib.Path((file_name) + 'seqs.fa').expanduser().resolve()
path.parent.mkdir(parents=True, exist_ok=True)
r.raw.read = functools.partial(r.raw.read, decode_content=True)
with path.open("ab") as f:
shutil.copyfileobj(r.raw, f)
downloaded += 1
pbar.update(file_size)
print('Sequences with these accession IDs were not found:\n', not_found)
print(downloaded, 'sequences downloaded')
评论
0赞
Irfan
9/7/2023
我在 IDLE 中得到了他的: wrong1 -- not found wrong2 -- not found wrong3 -- not found 下载: 0%| |0.00/1.45 千米赛 [00:00<?, ?B/s] 下载:36%|███▌ |534/1.45k [00:00<00:01, 677B/秒] 下载: 72%|███████▏ |1.04k/1.45k [00:01<00:00, 677B/秒] 下载: 1.56kB [00:02, 698B/s] 下载: 2.09kB [00:03, 665B/s] 下载: 2.61kB [00:04, 609B/s] 下载: 3.13kB [00:05, 603B/s] 下载: 3.13kB [00:05, 586B/s] 未找到具有这些登录 ID 的序列: ['wrong1', 'wrong2', 'wrong3'] 下载了 3 个序列
0赞
Irfan
9/7/2023
看起来进度条已更新,但在 IDLE 的下一行显示新的进度。
评论