提问人:T6VK 提问时间:10/28/2023 最后编辑:T6VK 更新时间:10/28/2023 访问量:70
如何根据现有 URL 从 HTML 元素中抓取数据?
How do I scrape data from within an HTML element based on an existing URL?
问:
我有一个脚本可以将RSS数据保存到电子表格中,但它仍然有缺点和问题。
我收到了标题、时间、文章链接形式的数据。 https://i.stack.imgur.com/9YTAF.png
我希望脚本能够根据每个文章链接中的标签或 HTML 类检索描述,以便我获得的数据是标题、时间、描述、文章链接
例如,我想从文章链接中检索名为 entry-content 的 div 类的描述 https://e-ficiencia.com/samsung-climate-solutions-acudira-cyr-2023/
我希望我在电子表格上获得的数据是这样的
https://i.stack.imgur.com/kT00s.png https://docs.google.com/spreadsheets/d/1lPn7xHEEI1NknN8l9w6hu4SkQburm8s-NdAjPsPc-NM/edit#gid=0
遵循我的 Google Apps 脚本
function myFunction() {
getURLData();
}
function getURLData() {
var currentData = [];
var urltoCheck = ["https://e-ficiencia.com/feed/", "https://www.climanoticias.com/feed/all","https://www.proinstalaciones.com/actualidad/noticias?format=feed"];
for (var i = 0; i < urltoCheck.length; i++){
var ficiencaData = UrlFetchApp.fetch(urltoCheck[i]);
var xml = ficiencaData.getContentText()
let response = XmlService.parse(xml);
var root = response.getRootElement();
let channel = root.getChild('channel');
let items = channel.getChildren('item');
items.forEach(item => {
let title = item.getChild('title').getText();
let pubDateb = item.getChild('pubDate').getText();
let link = item.getChild('link').getText();
currentData.push([title,pubDateb,link])
});
}
var ss = SpreadsheetApp.getActiveSpreadsheet()
var sheet = ss.getSheetByName("Sheet1");
var currentDataRange = sheet.getRange(sheet.getLastRow() + 1, 1, currentData.length, currentData[0].length);
currentDataRange.setValues(currentData);
}
答: 暂无答案
上一个:《时代》杂志RSS订阅
评论
My hope is that the data I get on the Spreadsheet will be like this
For example, I want to retrieve the description of a div class called entry-content from the article link https://e-ficiencia.com/samsung-climate-solutions-acudira-cyr-2023/
Your answer is not wrong, but it doesn't meet my expectations.