驱动程序页面源未捕获新页面信息

Driver page source not catching new page information

提问人:Paul Corcoran 提问时间:11/18/2023 最后编辑:Paul Corcoran 更新时间:11/18/2023 访问量:23

问:

我正在使用 selenium/Bs4 返回从此页面抓取的游戏,示例输出如下,列表格式如下。但是,我使用 selenium 单击并更改时间框架参数,从而导致出现不同的游戏。当我尝试使用 bs4 抓取更新的driver.page_source它似乎没有捕获新的 javascript 时,关于如何在此处正确更新驱动程序页面的任何想法?可能需要将 Headless 函数设置为 True 才能运行

[{'match': 'Breidablik - Fylkir', 'time': '21:15', 'odds': [], 'bestOddsBookie': ['Unibet', '必发', '必发', 'Unibet', 'BetVictor', '互湿', '1xBet', “来吧”, “威廉希尔”, 'Unibet']},编辑:向驱动程序页面源添加刷新似乎有效。

import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
import warnings
from tabulate import tabulate
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from time import sleep, time

warnings.filterwarnings('ignore')

scraped_match_names = []
scraped_match_odds = []
bookLines = []
url = f'https://www.betexplorer.com/odds-movements/soccer/'
option = Options()
option.headless = False
driver = webdriver.Chrome("C:/Users/paulc/Documents/PremNet/chromedriver.exe",
                          options=option)
driver.get(url)

#click the cookie pop up
WebDriverWait(driver, 15).until(
        EC.element_to_be_clickable((By.XPATH, '//*[@id="onetrust-accept-btn-handler"]'))).click()

# Change the hours filter to <1 hour
WebDriverWait(driver, 15).until(
        EC.element_to_be_clickable((By.XPATH, '/html/body/div[2]/div[5]/div/div/div[1]/section/div[2]/ul[1]/li[2]'))).click()

WebDriverWait(driver, 15).until(
        EC.element_to_be_clickable((By.XPATH, '/html/body/div[2]/div[5]/div/div/div[1]/section/div[2]/ul[1]/li[2]/div/ul/li[1]'))).click()

# change the time for next day

WebDriverWait(driver, 15).until(
        EC.element_to_be_clickable((By.XPATH, '/html/body/div[2]/div[5]/div/div/div[1]/section/div[2]/ul[2]/li[2]/div/ul/li'))).click()

WebDriverWait(driver, 15).until(
        EC.element_to_be_clickable((By.XPATH, '/html/body/div[2]/div[5]/div/div/div[1]/section/div[2]/ul[2]/li[2]/div/ul/li[1]'))).click()

# scroll down the page
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# extract the page source
pageSource = driver.execute_script("return document.getElementsByTagName('html')[0].outerHTML")

sleep(10)
soup = BeautifulSoup(pageSource, 'html.parser')

matches = soup.find_all("td", class_="table-main__tt")
python selenium-webdriver beautifulsoup

评论


答: 暂无答案