Python Selenium:在网页抓取中到达可滚动 div 的末尾时如何停止 while 循环

Python Selenium: How to Stop While Loop When Reaching the End of a Scrollable div in Web Scraping

提问人:asma 提问时间:11/4/2023 更新时间:11/4/2023 访问量:21

问:

我正在使用 Python 和 Selenium 编写网络抓取脚本。我有一个 while 循环,可以滚动网页并收集餐厅数据。我想在到达页面末尾时停止循环,但我不确定如何检测这种情况。这是我的代码:

try:
    # Locate the scrollable div element for restaurant results
    scrollable_main_div = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, '//div[@aria-label="نتائج عن restaurants in riyadh"]'))
    )
    
    i = 1 

    while True:
        # Scroll down to the end of the div
        driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_main_div)
        time.sleep(2)

        # Locate and store a single restaurant element
        restaurant = driver.find_element(By.XPATH, f'(//div[@aria-label="نتائج عن restaurants in riyadh"]//div[contains(@class,"Nv2PK THOPZb CpccDe ")])[{i}]')

        # Get the restaurant's name
        name = get_name_check(restaurant)

        # Check for duplicate restaurant names
        if name not in restaurant_names:
            i += 1
            restaurant_names.append(name)
            print(name)
            print("_____________________________")

        time.sleep(1)

        # This condition is intended to stop scrolling when the end of the page is reached
        if driver.execute_script('arguments[0].scrollTop >= arguments[0].scrollHeight', scrollable_main_div):
            break

except TimeoutException:
    print("Timeout Exception: Check the page or adjust the waiting time")
except Exception as e:
    print(f"An error occurred: {e}")

# Create a DataFrame and save it to a CSV file
df = pd.DataFrame(restaurant_names)
df.to_csv("names.csv", index=False)

我尝试滚动 div 并收集餐厅名称,并成功地将名称存储在restaurant_names列表中。但是,我需要帮助添加一个条件,以便在 div 中没有更多数据要收集时停止 while 循环,以便我可以创建 DataFrame 并将其保存到 CSV 文件。

python selenium-webdriver web-scraping while-loop

评论

0赞 MD Kawsar 11/4/2023
这完全取决于网站元素,您可以等到一些整理元素到达或类似的东西,如果您可以共享网站,那将更有意义!

答: 暂无答案