Python - Webscraping - 从网格和 flex 字段获取数据

Python - Webscraping - get data from grid and flex field

提问人:motylas 提问时间:11/13/2023 最后编辑:motylas 更新时间:11/13/2023 访问量:38

问:

我正在使用 selenium,但我无法从标记为 flex 的 DIV 中获取数据 https://www.jpg.store/collection/hungrycowsbymuesliswap?tab=items

我需要 asset_id 属性(标记为黄色)页面源的值

我的代码只能到达我正在寻找的值上方的 div

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")  
chrome_options.add_argument("--disable-gpu")   headless
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.jpg.store/collection/hungrycowsbymuesliswap?tab=items'
driver.set_page_load_timeout(4) 
driver.get(url)

element = driver.find_element(By.XPATH, "//body/div/div[1]/main/div[2]/div/section/div/div[2]/div/div/div[1]")

print(element.get_attribute('outerHTML'))

如果我将另一个div添加到xpath,代码将显示错误:selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//body/div/div[1]/main/div[2]/div/section/div/div[2]/div/div/div[1]/div"}

python selenium-webdriver web-scraping beautifulsoup flexbox

评论


答:

0赞 Yaroslavm 11/13/2023 #1

你可以用定位器搜索元素数组,等待它的存在,使用并获取每个找到的元素的属性。div[asset_id]WebDriverWaitasset_id

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.jpg.store/collection/hungrycowsbymuesliswap?tab=items'
driver.get(url)
wait = WebDriverWait(driver, 10)
asset_els = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "div[asset_id]")))

for element in asset_els:
    print(element.get_attribute('asset_id'))