从具有相同文件名的多个 URL 下载和重命名图像-解网

问：

我正在尝试从存档中下载图像。我有图像 URL，并且能够使用下面的代码成功下载每个文件。但是，某些图像使用相同的名称（例如压缩的 .jpg），因此在运行命令时，只会创建一个压缩的 .jpg 文件。

我希望能够在下载时重命名这些文件，因此我最终会得到压缩1.jpg，压缩2.jpg等。我对 Python 非常陌生，所以试图在文件名末尾添加增量数字时让自己陷入一团糟。

谢谢

import requests    
image_url =[
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
for img in image_url:     
     file_name = img.split('/')[-1]     
     print("Downloading file:%s"%file_name)    
     r = requests.get(img, stream=True)      
     # this should be file_name variable instead of "file_name" string    
     with open(file_name, 'wb') as f:    
         for chunk in r:    
             f.write(chunk)

我尝试过使用 os 和 glob 重命名，但没有运气 - 如何在下载之前重命名文件？

python 存档映像下载

import requests
import os.path

image_url = [
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
for index, img in enumerate(image_url):
    file_name_string = img.split('/')[-1]
    file_name_list = os.path.splitext(file_name_string)
    target_file = f"{file_name_list[0]}{index + 1}{file_name_list[1]}"
    print("Downloading file:%s" % target_file)
    r = requests.get(img, stream=True)
    with open(target_file, 'wb') as f:
        for chunk in r:
            f.write(chunk)

0赞 Marco Parola 11/7/2023 #2

您可以为每个图像维护一个计数器，并将其附加到文件名中：

import requests
import os

image_url = [
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
    'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]

for i, img in enumerate(image_url, start=1):
    file_name = img.split('/')[-1]
    
    # Get the file extension
    file_extension = os.path.splitext(file_name)[-1]
    
    # Rename the file with an incremental number
    new_file_name = file_name + str(i) + file_extension 
    
    print("Downloading file: %s" % new_file_name)
    r = requests.get(img, stream=True)
    
    with open(new_file_name, 'wb') as f:
        for chunk in r:
            f.write(chunk)

0赞 Sam Mason 11/7/2023 #3

如果所有这些 URL 都来自一个共同的前缀，我会很想只使用带有斜杠的后缀。我还会使用一些错误检查来确保请求有效。

以下代码会将文件下载到类似以下名称的名称：000_103_975_thumbnail_compressed.jpg

import requests
import pathlib

image_urls =[
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
prefix = 'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/'

for url in image_urls:
    # turn the url into something suitable for local use
    out = pathlib.Path(url.removeprefix(prefix).replace('/', '_'))

    # no point fetching something we've already got
    # you can delete the file to retry if you really want that
    if out.exists():
        print(f"already saved {url} as {out}")
        continue

    # open the file early, failures will result in an empty file and hence won't be retried
    with open(out, 'wb') as fd, requests.get(url, stream=True) as resp:
        # don't want to save HTTP 404 or 501, leave these empty
        if not resp.ok:
            print(f"HTTP server error while fetching {url}:", resp)
            continue
        for chunk in resp.iter_content(2**18):
            fd.write(chunk)
        print(f"{url} saved to {out}")

上一个：无法使用 ansible 将多个文件存档在单个 tar 中

下一个：Xcode 无法存档代码，说 ld：找不到体系结构的符号 arm64 clang：错误：链接器命令失败，退出代码 1 存档失败

从具有相同文件名的多个 URL 下载和重命名图像

Downloading and renaming images from multiple URL with the same file name

评论