提问人:Deka Halane 提问时间:10/27/2023 更新时间:10/28/2023 访问量:34
使用 Selenium 抓取网站的 ElementNotInteractableException 和 StaleElementReferenceException
ElementNotInteractableException and StaleElementReferenceException scraping website using Selenium
问:
我正在尝试抓取本网站上每个链接中的内容:币安列表
但是,代码会不断引发 ElementNotInteractableException 或 StaleElementReferenceException。当我尝试处理错误时,我陷入了这个不断出现新错误的循环中。这是我第一次使用Selenium,所以任何帮助将不胜感激!
代码如下:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
import re
import datetime
# Initialize the Chrome WebDriver (change the path to your WebDriver)
driver = webdriver.Chrome()
# Navigate to the webpage with your HTML content
driver.get("https://www.binance.com/en/support/announcement/new-cryptocurrency-listing?c=48&navId=48")
# Wait for the cookie consent pop-up to appear and then accept it
wait = WebDriverWait(driver, 20)
cookie_consent = wait.until(EC.presence_of_element_located((By.ID, 'onetrust-banner-sdk')))
cookie_consent.click()
# wait = WebDriverWait(driver, 20)
# Click on the accept button
accept_button = wait.until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler')))
# accept_button.send_keys("arguments[0].scrollIntoView();")
driver.execute_script("arguments[0].scrollIntoView();", accept_button)
accept_button.click()
# link = None
# Find all div elements within the specified class
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
timeout = 20
# Extract links and titles from the div elements
for div_element in div_elements:
# Create a WebDriverWait instance with expected conditions
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
wait = WebDriverWait(driver, timeout)
try:
# Define a function to perform the action and return True
# def click_checkout_link(driver):
# global link
link_element = div_element.find_element(By.CSS_SELECTOR, "a") # Find the link element
# link_element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a'))) # Click the link
link = link_element.get_attribute('href') # Get the "href" attribute
print("Link: ", link)
link_element.click()
meta_element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//meta[@name="description"]'))
)
content = meta_element.get_attribute('content')
content = content.split(" ")
# print("OG: ", content)
content = [string for string in content if string.isupper() or string[0].isnumeric()]
content = tuple(content)
print(content)
timestamp = None
date = None
for string in content:
if ":" in string:
timestamp = string
else:
try:
if datetime.datetime.strptime(string, "%Y-%m-%d"):
date = string
except ValueError:
continue
print("Date: ", date)
print("Time: ", timestamp)
# Use the until method to wait for the condition to be met
# link_element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a')))
# link_element.click()
except StaleElementReferenceException as e:
# Handle the StaleElementReferenceException (if needed)
print("Element is stale, handle this case if necessary")
print(e)
continue
# link = div_element.find_element(By.CSS_SELECTOR, 'a')
# title = div_element.find_element(By.CSS_SELECTOR, 'head > title').text
# Click on the link to navigate to the linked page
# link_url = link.get_attribute('href')
# # driver.execute_script("arguments[0].click();", link)
# print(f"Link: {link_url}")
# Wait for the page to load (you can adjust the sleep time)
time.sleep(2)
# try:
# Find and extract data from the linked page
time.sleep(2) # Sleep for a while before going back (you can adjust this)
# Go back to the main page
driver.back()
# Close the WebDriver when done
driver.quit()
我尝试在 StackOverflow 上解决相同的问题并使用 try 和 except 块来捕获 NoSuchElementException
accept_button = wait.until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler')))
和
wait = WebDriverWait(driver, 20)
cookie_consent = wait.until(EC.presence_of_element_located((By.ID, 'onetrust-banner-sdk')))
但是,当我尝试使用这一行时,我到达了错误的网站(binance.com),我不知道为什么会发生这种情况,只有在我使用 wait.until() 函数时才会发生这种情况。应该在抓取各个链接后返回原始网站。wait.until(EC.presence_of_element_located
link_element = div_element.find_element(By.CSS_SELECTOR, "a") # Find the link element
此外,当我运行调试模式时,我没有收到此错误。我知道这与在返回链接列表时刷新网站后重新定位相同元素有关,但我不知道如何操作。ElementNotInteractableException
StaleElementReferenceException
答:
为了避免使用原生 JS,请单击使用 JS executor。ElementNotInteractableException
StaleElementReferenceException
异常,当你初始化循环时,div_elements数组不会通过给它分配新值来重新启动,所以你会得到旧的元素引用。for in
为了避免这种情况,你可以使用re-init数组的位置,并按索引获取数组元素。for in range
# previous code until getting div elements
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
timeout = 20
for i in range(len(div_elements)):
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
wait = WebDriverWait(driver, timeout)
link_element = div_elements[i].find_element(By.CSS_SELECTOR, "a")
link = link_element.get_attribute('href')
print("Link: ", link)
driver.execute_script('arguments[0].click();', link_element)
# your further code
评论