如何根据现有 URL 从 HTML 元素中抓取数据？-解网

问：

我有一个脚本可以将RSS数据保存到电子表格中，但它仍然有缺点和问题。

我收到了标题、时间、文章链接形式的数据。 https://i.stack.imgur.com/9YTAF.png

我希望脚本能够根据每个文章链接中的标签或 HTML 类检索描述，以便我获得的数据是标题、时间、描述、文章链接

例如，我想从文章链接中检索名为 entry-content 的 div 类的描述 https://e-ficiencia.com/samsung-climate-solutions-acudira-cyr-2023/

我希望我在电子表格上获得的数据是这样的

https://i.stack.imgur.com/kT00s.png https://docs.google.com/spreadsheets/d/1lPn7xHEEI1NknN8l9w6hu4SkQburm8s-NdAjPsPc-NM/edit#gid=0

遵循我的 Google Apps 脚本

function myFunction() {
  getURLData();
}

function getURLData() {
 
  var currentData = [];
  var urltoCheck = ["https://e-ficiencia.com/feed/", "https://www.climanoticias.com/feed/all","https://www.proinstalaciones.com/actualidad/noticias?format=feed"];
  for (var i = 0; i < urltoCheck.length; i++){
  var ficiencaData = UrlFetchApp.fetch(urltoCheck[i]);
  var xml = ficiencaData.getContentText()
  let response = XmlService.parse(xml);
  var root = response.getRootElement();
   let channel = root.getChild('channel');
  let items = channel.getChildren('item');
    items.forEach(item => {
      let title = item.getChild('title').getText();
      let pubDateb = item.getChild('pubDate').getText();
      let link = item.getChild('link').getText();
      currentData.push([title,pubDateb,link])
   
  });
}
  var ss = SpreadsheetApp.getActiveSpreadsheet()
  var sheet = ss.getSheetByName("Sheet1");
  var currentDataRange = sheet.getRange(sheet.getLastRow() + 1, 1, currentData.length, currentData[0].length);
  currentDataRange.setValues(currentData); 
  
}

google-apps-script google-sheets 网页抓取 rss rss-reader

很遗憾，我无法打开您提供的电子表格进行确认。你能再确认一次吗？而且，我必须为我糟糕的英语水平道歉。不幸的是，从，我无法理解你的期望值。我能问你你的期望值吗？My hope is that the data I get on the Spreadsheet will be like this

For example, I want to retrieve the description of a div class called entry-content from the article link https://e-ficiencia.com/samsung-climate-solutions-acudira-cyr-2023/

0赞 T6VK 10/28/2023

@Tanaike i.stack.imgur.com/kT00s.png

0赞 Tanaike 10/28/2023

感谢您的回复。从您的回复中，我提出了一个修改后的脚本作为答案。请确认。如果我误解了你的预期结果，我深表歉意。

0赞 Tanaike 10/28/2023

感谢您的回复。我必须为我糟糕的英语水平道歉。从，我明白我对你问题的理解是不正确的。在这种情况下，我必须删除我的答案。我再次为我糟糕的英语水平深表歉意。当我能正确理解你的问题时，我想想出一个解决方案。Your answer is not wrong, but it doesn't meet my expectations.

0赞 T6VK 10/28/2023

对不起，我给你带来了麻烦和失望，我希望你不要被我之前的回复冒犯

答： 暂无答案

上一个：《时代》杂志RSS订阅

下一个：如何通过API或RSS从网站获取数据？

如何根据现有 URL 从 HTML 元素中抓取数据？

How do I scrape data from within an HTML element based on an existing URL?

评论