在多个页面中按 xpath 查找元素

Find element by xpath in multiple pages

提问人:Arthur 提问时间:10/23/2023 更新时间:10/23/2023 访问量:48

问:

我正在测试Selenium在网站上进行网络抓取,但我有一个问题:

该网站包含多个页面,我需要的信息始终位于带有 ID 的元素中。例如,在第一页上,我的 ID 范围从“card0”到“card50”。但是,此模式在第二页重复,再次从“card0”开始,一直到“card50”。

我正在尝试使用“find_element By.XPATH”来定位这些元素,但是我无法以正常工作的方式重复此操作。下面是代码片段:

element = driver.find_element(By.XPATH,"//*[text()[contains(.,id='card')]]")

感谢大家的支持。

python selenium-webdriver xpath findelement

评论


答:

1赞 Payrfix 10/23/2023 #1

假设你的 html 看起来像这样

<div id="card0">...</div>
<div id="card1">...</div>
<div id="card2">...</div>
<div id="card3">...</div>
<div id="card4">...</div>
<div id="card5">...</div>
...

你想通过使用 来获取所有这些 div 元素,你可以这样做XPATH

element = driver.find_elements(By.XPATH,'//*[contains(@id, "card")]')

您需要使用(带有“s”),因为您需要多个元素driver.find_elements

您也可以使用CSS_SELECTOR更轻松地执行此操作

element = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')

因此,您通常会在这样的多个页面中执行此操作

nextPageIsPresent = True
while nextPageIsPresent:
    elements = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')
    # Do what you wanna do with the elements
    # Check if there is still nextpage

您不能将这些元素存储在变量中并在循环后使用它,因为它会导致您访问过另一个页面。StaleElementReferenceException

0赞 Michael King 10/23/2023 #2

这是我制作的一个,您可能需要检查元素并找到CSS选择器,但除此之外,它应该很好:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set the download directory
prefs = {
    "download.default_directory": r"F:\models",
    "download.prompt_for_download": False
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_experimental_option("prefs", prefs)
# Launch the Chrome browser
driver = webdriver.Chrome(options=chrome_options)
try:
    # Navigate to the login page
    url = "url goes here"
    driver.get(url)
    # Find and click the login button
    login_button = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "a.login"))
    )
    login_button.click()
    # Wait for the login form to appear
    login_form = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "login_form"))
    )
    # Fill in the email and password fields
    email_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[email]']")
    email_input.send_keys("email goes here")
    password_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[password]']")
    password_input.send_keys("password goes here")
    # Click the sign in button
    signin_button = login_form.find_element(By.ID, "signInButton")
    signin_button.click()
    while True:
        try:
            # Find and click the download button
            download_buttons = WebDriverWait(driver, 10).until(
                EC.presence_of_all_elements_located((By.CSS_SELECTOR, "span.gc-icon.gc-icon-download"))
            )
            for button in download_buttons:
                button.click()
            # Find and click the next page button
            next_button = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "li.pagination-next.ng-scope"))
            )
            next_button.click()
        except EC.WebDriverException:
            # If there are no more pages or the buttons are not clickable, break out of the loop
            break
finally:
# Close the browser
    driver.quit()