提问人:surya bharath 提问时间:10/9/2023 更新时间:10/9/2023 访问量:13
如果逐页抓取数据,如何迭代浏览页面?
How to navigate through pages iteratively if are scraping data page by page?
问:
我正在尝试构建一个chrome扩展程序来从LinkedIn抓取数据。我在浏览页面时遇到问题,因为它有分页。它只是从第一页中提取给出共鸣。
当我尝试仅使用基于弹出窗口的代码时,它正在工作。现在我正在尝试在内容.js上执行此操作。导航到下一页并执行抓取不起作用。
内容 .js
const performExtraction = () => {
const pageData = [];
window.scroll(0, document.body.scrollHeight);
const liElements = document.querySelectorAll(
'ul.reusable-search__entity-result-list li.reusable-search__result-container'
);
for (let i = 0; i < liElements.length; i++) {
const li = liElements[i];
const anchor = li.querySelector('a.app-aware-link');
const paragraph = li.querySelector('p.entity-result__content-summary');
if (anchor && paragraph) {
const profileUrl = anchor.href;
const summary = paragraph.textContent.replace(/\s+/g, ' ').trim();
pageData.push({ profileUrl, summary });
}
}
return pageData;
};
const extractDataFromPage = (max, currentPage = 1, results = []) => {
const pageData = performExtraction();
results.push(...pageData);
if (currentPage < max) {
const nextButton = document.querySelector('button[aria-label="Next"]');
if (nextButton) {
nextButton.click();
setTimeout(() => {
extractDataFromPage(max, currentPage + 1, results);
}, 2000);
} else {
/*eslint-disable no-undef */
chrome.runtime.sendMessage({
extractionComplete: true,
data: results,
});
}
} else {
/*eslint-disable no-undef */
chrome.runtime.sendMessage({
extractionComplete: true,
data: results,
});
}
};
chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
if (message.action === 'startExtraction') {
console.log('Received startExtraction message:', message);
const max = message.max;
extractDataFromPage(max);
}
});
答: 暂无答案
评论