提问人:Arthur 提问时间:10/23/2023 更新时间:10/23/2023 访问量:48
在多个页面中按 xpath 查找元素
Find element by xpath in multiple pages
问:
我正在测试Selenium在网站上进行网络抓取,但我有一个问题:
该网站包含多个页面,我需要的信息始终位于带有 ID 的元素中。例如,在第一页上,我的 ID 范围从“card0”到“card50”。但是,此模式在第二页重复,再次从“card0”开始,一直到“card50”。
我正在尝试使用“find_element By.XPATH”来定位这些元素,但是我无法以正常工作的方式重复此操作。下面是代码片段:
element = driver.find_element(By.XPATH,"//*[text()[contains(.,id='card')]]")
感谢大家的支持。
答:
1赞
Payrfix
10/23/2023
#1
假设你的 html 看起来像这样
<div id="card0">...</div>
<div id="card1">...</div>
<div id="card2">...</div>
<div id="card3">...</div>
<div id="card4">...</div>
<div id="card5">...</div>
...
你想通过使用 来获取所有这些 div 元素,你可以这样做XPATH
element = driver.find_elements(By.XPATH,'//*[contains(@id, "card")]')
您需要使用(带有“s”),因为您需要多个元素driver.find_elements
您也可以使用CSS_SELECTOR更轻松地执行此操作
element = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')
因此,您通常会在这样的多个页面中执行此操作
nextPageIsPresent = True
while nextPageIsPresent:
elements = driver.find_elements(By.CSS_SELECTOR, '[id*="card"]')
# Do what you wanna do with the elements
# Check if there is still nextpage
您不能将这些元素存储在变量中并在循环后使用它,因为它会导致您访问过另一个页面。StaleElementReferenceException
0赞
Michael King
10/23/2023
#2
这是我制作的一个,您可能需要检查元素并找到CSS选择器,但除此之外,它应该很好:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Set the download directory
prefs = {
"download.default_directory": r"F:\models",
"download.prompt_for_download": False
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_experimental_option("prefs", prefs)
# Launch the Chrome browser
driver = webdriver.Chrome(options=chrome_options)
try:
# Navigate to the login page
url = "url goes here"
driver.get(url)
# Find and click the login button
login_button = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "a.login"))
)
login_button.click()
# Wait for the login form to appear
login_form = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "login_form"))
)
# Fill in the email and password fields
email_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[email]']")
email_input.send_keys("email goes here")
password_input = login_form.find_element(By.CSS_SELECTOR, "input[name='member[password]']")
password_input.send_keys("password goes here")
# Click the sign in button
signin_button = login_form.find_element(By.ID, "signInButton")
signin_button.click()
while True:
try:
# Find and click the download button
download_buttons = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, "span.gc-icon.gc-icon-download"))
)
for button in download_buttons:
button.click()
# Find and click the next page button
next_button = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "li.pagination-next.ng-scope"))
)
next_button.click()
except EC.WebDriverException:
# If there are no more pages or the buttons are not clickable, break out of the loop
break
finally:
# Close the browser
driver.quit()
评论