使用 smart_open 从 http 下载.gz流并上传到 s3 存储桶

Use smart_open to download a .gz stream from http and upload to s3 bucket

提问人:ffi23 提问时间:3/28/2023 更新时间:3/29/2023 访问量:489

问:

我想从 http 流式传输下载一个 .txt.gz 文件并将流式传输到 s3 存储桶,我已经做到了这一点,但它不起作用,我错过了什么?

from smart_open import open as sopen

chunk_size = (16 * 1024 * 1024)
http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout:                

                    while True:
                        buf = fin.read(chunk_size)
                        if not buf:
                            break
                        fout.write(chunk_size)
蟒蛇 亚马逊-S3 BOTO3 智能开放

评论


答:

1赞 ffi23 3/29/2023 #1

事实证明,我制作它可能要简单得多。

虽然我不确定引擎盖下的smart_open是否正在解压缩和重新压缩文件?

from smart_open import open as sopen

http_url = 'http://someurl'

with sopen(http_url, 'rb', transport_params={'headers' : {'Subscription-Key': 'somekey'}}) as fin:    
    with sopen('s3://bucket/filename.txt.gz', 'wb') as fout: 
        for line in fin:
            fout.write(line)