如果逐页抓取数据,如何迭代浏览页面?

How to navigate through pages iteratively if are scraping data page by page?

提问人:surya bharath 提问时间:10/9/2023 更新时间:10/9/2023 访问量:13

问:

我正在尝试构建一个chrome扩展程序来从LinkedIn抓取数据。我在浏览页面时遇到问题,因为它有分页。它只是从第一页中提取给出共鸣。

当我尝试仅使用基于弹出窗口的代码时,它正在工作。现在我正在尝试在内容.js上执行此操作。导航到下一页并执行抓取不起作用。

内容 .js

const performExtraction = () => {
  const pageData = [];
  window.scroll(0, document.body.scrollHeight);
  const liElements = document.querySelectorAll(
    'ul.reusable-search__entity-result-list li.reusable-search__result-container'
  );

  for (let i = 0; i < liElements.length; i++) {
    const li = liElements[i];
    const anchor = li.querySelector('a.app-aware-link');
    const paragraph = li.querySelector('p.entity-result__content-summary');

    if (anchor && paragraph) {
      const profileUrl = anchor.href;
      const summary = paragraph.textContent.replace(/\s+/g, ' ').trim();
      pageData.push({ profileUrl, summary });
    }
  }

  return pageData;
};

const extractDataFromPage = (max, currentPage = 1, results = []) => {
  const pageData = performExtraction();
  results.push(...pageData);

  if (currentPage < max) {
    const nextButton = document.querySelector('button[aria-label="Next"]');
    if (nextButton) {
      nextButton.click();
      setTimeout(() => {
        extractDataFromPage(max, currentPage + 1, results);
      }, 2000);
    } else {
      /*eslint-disable no-undef */
      chrome.runtime.sendMessage({
        extractionComplete: true,
        data: results,
      });
    }
  } else {
    /*eslint-disable no-undef */
    chrome.runtime.sendMessage({
      extractionComplete: true,
      data: results,
    });
  }
};

chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  if (message.action === 'startExtraction') {
    console.log('Received startExtraction message:', message);
    const max = message.max;

    extractDataFromPage(max);
  }
});

javascript reactjs dom google-chrome-extension

评论


答: 暂无答案