在 Google 表格中使用 IMPORTXML 从 Google 支持页面中提取元素时遇到问题-解网

问：

我正在尝试使用 Google 表格中的 IMPORTXML 函数从特定的 Google 支持页面中提取元素。该公式适用于其他 URL，但是当我在 Google 支持页面上使用它时，它会返回 Could not fetch url 错误：

=IMPORTXML("https://support.google.com/looker-studio/answer/11521624?hl=en", "//h2")

标题肯定存在于页面上。
该页面不需要登录，并且可以公开访问。
IMPORTXML 函数可成功与其他非 Google 网站配合使用。
我更愿意使用IMPORTXML本身来解决此问题，而不是Google Apps Script或手动复制等其他方法。

这是 IMPORTXML 对 Google 自己的网页的特定限制吗？是否有任何已知的解决方法或对 IMPORTXML 查询的特定调整可能会绕过此问题，同时仍在使用 IMPORTXML？

google-sheets 网页抓取 google-sheets-formula

function fetchAllH2Text(url) {
  const html = UrlFetchApp.fetch(url).getContentText();

  // Use regular expression to match content between <h2> tags
  const h2Matches = html.match(/<h2[^>]*>([\s\S]*?)<\/h2>/g);

  if (h2Matches) {
    // Extract text content from the matches
    const h2TextArray = h2Matches.map(match => match.replace(/<\/?h2[^>]*>/g, '').trim());
    return h2TextArray;
  } else {
    return ["No <h2> tags found"];
  }
}

这将允许您使用如下所示的自定义函数：

=fetchAllH2Text("https://support.google.com/looker-studio/answer/11521624?hl=en")

由于 URL 没有格式良好的 XML，这是一开始不起作用的原因，我们使用正则表达式来提取标签之间的文本。这会导致某些内容仍包含在该标记内的标记中。从这里，您可以使用标准公式来清理您要查找的内容。IMPORTXML

更新2：

或者，如果您想通过正则表达式清理数据，您可以使用此脚本，该脚本将仅从匹配项中返回长日期。

function fetchDatesFromH2Tags(url) {
  const html = UrlFetchApp.fetch(url).getContentText();

  // Use regular expression to match content between <h2> tags
  const h2Matches = html.match(/<h2[^>]*>(?:<a[^>]*>.*?<\/a>)?(?:January|February|March|April|May|June|July|August|September|October|November|December) \d{1,2}, \d{4}<\/h2>/gi);

  if (h2Matches) {
    // Extract text content from the matches
    const h2TextArray = h2Matches.map(match => match.replace(/<\/?h2[^>]*>|<a[^>]*>|<\/a>/g, '').trim());
    return h2TextArray;
  } else {
    return ["No <h2> tags found"];
  }
}

请务必使用新的函数名称=fetchDatesFromH2Tags("https://support.google.com/looker-studio/answer/11521624?hl=en")

在 Google 表格中使用 IMPORTXML 从 Google 支持页面中提取元素时遇到问题

Trouble Extracting Elements from Google Support Pages Using IMPORTXML in Google Sheets

评论

评论