从 r/worldnews 实时线程收集所有顶级评论-解网

问：

我是一名学生，试图从这个 r/worldnews 实时线程中获取所有顶级评论：https://www.reddit.com/r/worldnews/comments/1735w17/rworldnews_live_thread_for_2023_israelhamas/ 学校研究项目。我目前正在使用 PRAW API 和 pandas 库在 Python 中编码。这是我到目前为止编写的代码：

url = "https://www.reddit.com/r/worldnews/comments/1735w17/rworldnews_live_thread_for_2023_israelhamas/"
submission = reddit.submission(url=url)
comments_list = []
def process_comment(comment):
if isinstance(comment, praw.models.Comment) and comment.is_root:
comments_list.append({
'author': comment.author.name if comment.author else '[deleted]',
'body': comment.body,
'score': comment.score,
'edited': comment.edited,
'created_utc': comment.created_utc,
'permalink': f"https://www.reddit.com{comment.permalink}"
})
submission.comments.replace_more(limit=None, threshold=0)
for top_level_comment in submission.comments.list():
process_comment(top_level_comment)
comments_df = pd.DataFrame(comments_list)

但是当 limit=None 时，代码会超时。使用其他限制（100,300,500）仅返回 ~700 条注释。从这个 Reddit 线程收集顶级评论的任何帮助将不胜感激。

我查看了大约数百页的文档/ Reddit线程，并尝试了以下技术：

为 Reddit API 编写“超时”代码，然后在休息后继续收集评论
分批收集意见，然后再次致电replace_more 但无济于事。我还查看了 Reddit API 速率限制请求文档，希望有一种方法可以绕过这些限制。

Python 熊猫数据科学 Reddit Praw

def process_comment(comment):
    if isinstance(comment, praw.models.Comment) and comment.is_root:
        comments_list.append({
            'author': comment.author.name if comment.author else '[deleted]',
            'body': comment.body,
            'score': comment.score,
            'edited': comment.edited,
            'created_utc': comment.created_utc,
            'permalink': f"https://www.reddit.com{comment.permalink}"
        })

def gather_comments(comment_list):
    for comment in comment_list:
        if isinstance(comment, praw.models.MoreComments):
            try:
                comment_list = comment_list[:comment_list.index(comment)] + comment.comments() + comment_list[comment_list.index(comment) + 1:]
            except Exception as e:
                print(f"Error replacing MoreComments: {e}")
        else:
            process_comment(comment)

    if any(isinstance(comment, praw.models.MoreComments) for comment in comment_list):
        gather_comments(comment_list)


top_level_comments = submission.comments
gather_comments(top_level_comments)

# Create DataFrame
comments_df = pd.DataFrame(comments_list)

上一个：如何在 Python matplotlib.pyplot 中更改异常值点符号

下一个：构建直方图并找到 5 个最大值的问题

从 r/worldnews 实时线程收集所有顶级评论

Gathering all top-level comments from r/worldnews live thread

评论