将 GIANT JSONP 保存到数据库

Saving GIANT JSONP to Database

提问人:Duk 提问时间:2/28/2023 最后编辑:Duk 更新时间:2/28/2023 访问量:18

问:

嘿,我从 archive.org 那里得到了这个 JSONP:

https://archive.org/advancedsearch.php?q=collection%3Ainternetarchivebooks&fl[]=creator&fl[]=format&fl[]=genre&fl[]=language&fl[]=name&fl[]=title&fl[]=type&fl[]=year&sort[]=&sort[]=&sort[]=&rows=100000000&page=1&output=json&callback=callback&save=yes

这里我只有一个 2 个输出:

https://archive.org/advancedsearch.php?q=collection%3Ainternetarchivebooks&fl[]=creator&fl[]=format&fl[]=genre&fl[]=language&fl[]=name&fl[]=title&fl[]=type&fl[]=year&sort[]=&sort[]=&sort[]=&rows=100000000&page=1&output=json&callback=callback&save=yes

这是 JSONP 吗?如何保存这个?

我想将其保存到较小的.json文件中。或者直接将其插入数据库,而不会耗尽 RAM。完整大小为 2GB。我下载了它,将其拆分以删除“callback( )”,将其放回原处并尝试将其切成更小的.带有python“json.loads()”的JSON文件。但它似乎在某一点上被破坏了。

所以我的问题是,如何处理这个巨型JQUERY?有没有办法将其直接从在线状态流式传输到数据库中?

你会怎么做?最后,它将出现在我的数据库中。我的第一步是创建较小的 JSON 文件,然后处理这些文件。有没有更简单的方法?

我试过这个,但似乎不对:

import os
import json
import requests

# specify the URL where the JSONP data is located
url = 'https://archive.org/advancedsearch.php?q=collection%3Ainternetarchivebooks&fl[]=creator&fl[]=format&fl[]=genre&fl[]=language&fl[]=name&fl[]=title&fl[]=type&fl[]=year&sort[]=&sort[]=&sort[]=&rows=100000000&page=1&output=json&callback=callback&save=yes'

# set the size of each chunk
size_of_the_chunk = 2000

# create a new directory to save the smaller JSON files
dir_name = 'data_split'
if not os.path.exists(dir_name):
    os.mkdir(dir_name)

# send a request to the URL and get the JSONP response
response = requests.get(url, stream=True)
jsonp = ''
for chunk in response.iter_content(chunk_size=1024):
    if chunk:
        jsonp += chunk.decode()

# split the JSONP data into smaller JSON lists and save each list to a separate file
count = 0
for start_idx in range(0, len(jsonp), size_of_the_chunk):
    end_idx = start_idx + size_of_the_chunk
    json_str = jsonp[start_idx:end_idx]
    start = json_str.index('(') + 1
    end = json_str.rindex(')')
    data = json.loads(json_str[start:end])
    filename = os.path.join(dir_name, f'{count+1}.json')
    with open(filename, 'w') as f:
        json.dump(data, f, ensure_ascii=False, indent=True)
    count += 1

print(f'Successfully split {count * size_of_the_chunk} records into {count} files.')
JSON API JSONP

评论


答:

0赞 Duk 2/28/2023 #1

知道了。我必须威胁它作为 JavaScript..

import json
    
# Read data from JavaScript-file
with open('data.json', 'r') as file:
data = file.read()
    
# Cleaning JSONP-Formats & Formatting to JSON
json_data = json.loads(data[data.index('{'):data.rindex('}')+1])
    
# Save to new JSON file
with open('tweets.json', 'w') as file:
json.dump(json_data, file)

现在是它的 JSON。在我的数据库中得到了它。.