我的 Xpaths 在 Scrapy Splash 中不起作用，但在 Selenium 中有效-解网

问：

我正在尝试列出 https://bigfuture.collegeboard.org/scholarships/ 的所有奖学金;我能够使用 Selenium 抓取所有链接并将其存储在列表中。但是，Selenium 无法扩展以抓取每个 Web 地址中的数据。我正在尝试使用 Scrapy 和 Splash，但使用 Xpath 或 CSS 选择器不起作用。这是我第一次进行网络抓取，所以我非常迷茫。我将不胜感激任何帮助！

class ScholarshipSpider(scrapy.Spider):
    name = 'scholarship'
    start_urls = [line.strip() for line in open("links.txt")]
    
    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, args={'wait': 7, 'html': 1, 'png': 1})

    def __init__(self, *args, **kwargs):
        super(ScholarshipSpider, self).__init__(*args, **kwargs)
        self.items_list = []
        
    def parse(self, response):
        
        item = {
            'name': response.xpath('//*[@id="main-content"]/div/div[2]/div/div/div[1]/section[1]/div/div[1]/h1/text()').get()

            #other items here
        }
        
        self.logger.info(item) 
        self.items_list.append(item)
        
        print(f"Name: {item['name']}") 
        
    def closed(self, reason):
        df = pd.DataFrame(self.items_list)
        df.to_csv('scraped_data.csv', index=False)

当我尝试使用 Selenium 时，Xpaths 可以工作，但我的代码在一段时间后停止工作。Scrapy 似乎是最好的选择，但无论我尝试什么，它都不起作用。

顺便说一句，我正在使用 Jupyter Notebook。

python 网页抓取 xpath scrapy scrapy-splash

我的 Xpaths 在 Scrapy Splash 中不起作用，但在 Selenium 中有效

My Xpaths don't work in Scrapy Splash, but work in Selenium

评论