使用 Selenium 抓取网站的 ElementNotInteractableException 和 StaleElementReferenceException

ElementNotInteractableException and StaleElementReferenceException scraping website using Selenium

提问人:Deka Halane 提问时间:10/27/2023 更新时间:10/28/2023 访问量:34

问:

我正在尝试抓取本网站上每个链接中的内容:币安列表

但是,代码会不断引发 ElementNotInteractableException 或 StaleElementReferenceException。当我尝试处理错误时,我陷入了这个不断出现新错误的循环中。这是我第一次使用Selenium,所以任何帮助将不胜感激!

代码如下:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import  StaleElementReferenceException
import re 
import datetime

# Initialize the Chrome WebDriver (change the path to your WebDriver)
driver = webdriver.Chrome()

# Navigate to the webpage with your HTML content
driver.get("https://www.binance.com/en/support/announcement/new-cryptocurrency-listing?c=48&navId=48")

# Wait for the cookie consent pop-up to appear and then accept it
wait = WebDriverWait(driver, 20)
cookie_consent = wait.until(EC.presence_of_element_located((By.ID, 'onetrust-banner-sdk')))
cookie_consent.click()
# wait = WebDriverWait(driver, 20)
# Click on the accept button
accept_button = wait.until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler')))
# accept_button.send_keys("arguments[0].scrollIntoView();")
driver.execute_script("arguments[0].scrollIntoView();", accept_button)
accept_button.click()


# link = None
# Find all div elements within the specified class
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
timeout = 20
# Extract links and titles from the div elements
for div_element in div_elements:
    # Create a WebDriverWait instance with expected conditions
    div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
    wait = WebDriverWait(driver, timeout)
    try: 
        # Define a function to perform the action and return True
        # def click_checkout_link(driver):
            # global link
        link_element = div_element.find_element(By.CSS_SELECTOR, "a")  # Find the link element
        # link_element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a')))  # Click the link
        link = link_element.get_attribute('href')  # Get the "href" attribute
        print("Link: ", link)
        link_element.click()
        meta_element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, '//meta[@name="description"]'))
    )
        content = meta_element.get_attribute('content')
        content = content.split(" ")
        # print("OG: ", content)
        content = [string for string in content if string.isupper() or string[0].isnumeric()] 
        content = tuple(content)
        print(content)
        timestamp = None
        date = None
        for string in content:
            if ":" in string:
                timestamp = string
            else: 
                try:
                    if datetime.datetime.strptime(string, "%Y-%m-%d"):
                        date = string
                except ValueError:
                    continue
        print("Date: ", date)
        print("Time: ", timestamp)
        # Use the until method to wait for the condition to be met
        # link_element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a')))
        # link_element.click()
    except StaleElementReferenceException as e:
        # Handle the StaleElementReferenceException (if needed)
        print("Element is stale, handle this case if necessary")
        print(e)
        continue
    # link = div_element.find_element(By.CSS_SELECTOR, 'a')
    
    # title = div_element.find_element(By.CSS_SELECTOR, 'head > title').text

    # Click on the link to navigate to the linked page
    # link_url = link.get_attribute('href')
    # # driver.execute_script("arguments[0].click();", link)
    # print(f"Link: {link_url}")

    # Wait for the page to load (you can adjust the sleep time)
    time.sleep(2)

    # try:
        # Find and extract data from the linked page
    

    time.sleep(2)  # Sleep for a while before going back (you can adjust this)

    # Go back to the main page
    driver.back()

# Close the WebDriver when done
driver.quit()

我尝试在 StackOverflow 上解决相同的问题并使用 try 和 except 块来捕获 NoSuchElementException

accept_button = wait.until(EC.element_to_be_clickable((By.ID, 'onetrust-accept-btn-handler')))

wait = WebDriverWait(driver, 20)
cookie_consent = wait.until(EC.presence_of_element_located((By.ID, 'onetrust-banner-sdk')))

但是,当我尝试使用这一行时,我到达了错误的网站(binance.com),我不知道为什么会发生这种情况,只有在我使用 wait.until() 函数时才会发生这种情况。应该在抓取各个链接后返回原始网站。wait.until(EC.presence_of_element_locatedlink_element = div_element.find_element(By.CSS_SELECTOR, "a") # Find the link element

此外,当我运行调试模式时,我没有收到此错误。我知道这与在返回链接列表时刷新网站后重新定位相同元素有关,但我不知道如何操作。ElementNotInteractableExceptionStaleElementReferenceException

python selenium-webdriver 币安

评论


答:

1赞 Yaroslavm 10/28/2023 #1

为了避免使用原生 JS,请单击使用 JS executor。ElementNotInteractableException

StaleElementReferenceException异常,当你初始化循环时,div_elements数组不会通过给它分配新值来重新启动,所以你会得到旧的元素引用。for in

为了避免这种情况,你可以使用re-init数组的位置,并按索引获取数组元素。for in range

# previous code until getting div elements
div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
timeout = 20
for i in range(len(div_elements)):
    div_elements = driver.find_elements(By.CSS_SELECTOR, '.css-148156o .css-1tl1y3y')
    wait = WebDriverWait(driver, timeout)
    link_element = div_elements[i].find_element(By.CSS_SELECTOR, "a") 
    link = link_element.get_attribute('href') 
    print("Link: ", link)
    driver.execute_script('arguments[0].click();', link_element)
# your further code