从 json 加载 Quarto html map 数据，用于在 R 中生成的 Leaflet map-解网

问：

我创建了一个 Quarto 博客文章，其中包含许多用 R 生成的地图。由于每个地图的数据都嵌入在 html 文件中，因此文件本身非常大。这会导致托管文件的服务器出现问题。leaflet

我想使html文件更小。Quarto 中的 YAML 选项意味着库（例如）存储在单独的文件中。有帮助，但数据仍存储在 html 中（每个地图一次）。我正在尝试从单独的文件加载数据本身。下面是一个最小的文件示例：embed-resources: falseleaflet.jsqmd

---
format:
  html:
    embed-resources: false  
---

```{r}
leaflet::leaflet(elementId = "map1") |>
    leaflet::addTiles() |>
    leaflet::addMarkers(lng = 174.768, lat = -36.852, popup = "The birthplace of R")
```

当我这样做时，它会创建一个 html 文件，该文件在浏览器中打开时显示地图。该文件包含以下地图的数据：quarto render<div>

<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="map1" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="map1">{**json**}</script>
</div>

在我写的地方，有一长行 json，其中包含地图坐标、CRS 和各种选项。{**json**}

在我看来，我也许可以将json内容复制到文件中，然后更改标签以从该文件加载数据：<script>

<script src="page_data/map1.json" type="application/json" data-for="map1"></script>

但是，我现在知道这是不可能的。相反，我尝试添加一个脚本以将 json 注入到所需元素中（使用 Live Server 进行测试）：innerHTML

<script>
  fetch('./page_data/map1.json')
    .then((response) => response.json())
    .then((json) => (
      document.querySelectorAll('[data-for="map1"]')[0].innerHTML = 
        JSON.stringify(json).replaceAll("\/", "/"))
    );
</script>

这的工作原理是，它将确切的 json 内容加载到标签中，就像它被硬编码到 html 文件中一样（需要使其相同，因为在反斜杠之前添加了几个转义字符）。replaceAll()

但是，仅此一项不会显示地图，控制台会引发以下错误：

Uncaught SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at htmlwidgets.js:646:27

相关行是：htmlwidgets.js

var scriptData = document.querySelector("script[data-for='" + el.id + "'][type='application/json']");
var data = JSON.parse(scriptData.textContent || scriptData.text);

即，在加载的脚本查找数据时，请求尚未更新标签，因此无需解析任何内容。fetch()innerHTML<script data-for="map1"></script>

考虑到这一点，以及请求，我移动了和其他标签以尝试延迟它们的加载。目前大约有 10 行标签，如下所示：fetch()htmlwidgets.js<script><head>

<script src="page_files/libs/htmlwidgets-1.6.2/htmlwidgets.js"></script>
<script src="page_files/libs/jquery-1.12.4/jquery.min.js"></script>

如果我在 and 标签之间将这些从 to 移动到，地图渲染的时间大约是一半。因此，看起来它们加载和将 json 注入标签的脚本之间存在某种竞争。<head></body></html><script data-for="map1"></script>

为了确保加载以正确的顺序进行，我从 html 中删除了脚本，并使用这个异步函数动态加载脚本，以确保它们仅在数据加载后加载：<head>loadScript()

fetch('./map1.json')
   .then((response) => response.json())
   .then((json) => (
document.querySelectorAll('[data-for="map1"]')[0].innerHTML = JSON.stringify(json).replaceAll("\/", "/"))) 
  .then(() =>    
loadScript("page_files/libs/htmlwidgets-1.6.2/htmlwidgets.js")
      ).then(() =>        
loadScript("page_files/libs/jquery-1.12.4/jquery.min.js"));
/*etc - for all scripts on the page in the order they appear in the html*/

现在，脚本仅在将 json 注入到标记中后加载。但是，它根本不渲染地图，并且 html 小部件未注册（即在控制台中返回）。<script data-for="map1"></script>document.getElementById("map1").htmlwidget_data_init_resultundefined

我是否遗漏了一些关于事件应该在静态 Quarto 生成的网页上发生的顺序？htmlwidgets

有没有办法让 Quarto html 文件从 json 文件加载 R 中生成的地图的数据并呈现地图？leaflet

JavaScript R 传单 quarto r-leaflet

答：

6赞 SamR 11/13/2023 #1

不可避免地，在悬赏两天后，我找到了解决方案。这种方法将我的真实 html 文件从 22.5mb 减少到 165kb。步骤如下：

从 html 中删除所有标记，在步骤 4 中存储要加载的数据之后要加载的 URL（以避免加载时没有要解析的数据的问题）。<script src = "*.js"><head>
在硬编码到 html 正文中的标签中找到 JS，并将代码移动到要在步骤 4 中加载的单独的 *.js 文件（以防止在中加载脚本之前加载它们而导致错误）。<script><head>
从标记中删除硬编码的映射（和任何其他）json 数据，并保存到文件夹中的单独 json 文件中。htmlwidgets<script type="application/json">./page_files/data/
将一个 JS 脚本插入到该脚本中，该脚本使用带有链式语句的 Promise 执行以下操作（按此顺序）：<head>.then()
- 将每个 json 文件中的相关数据注入回 html。
- 动态加载中的脚本。<head>
- 动态加载中的脚本。<body>
- 使用 HTMLWidgets.staticRender（） 渲染所有元素。

我编写了一个 Python 脚本来为任何 html 文件自动执行此操作。这可以作为选项添加到 Quarto 项目 YAML 中，例如：post-render

project:
  type: website
  post-render: remove_hardcoded_data.py

这将转换文件夹中的所有 html 文件。或者，如果在项目外部使用 Quarto，则可以将其放置在包含一个或多个 html 文件的文件夹中，并使用 ../remove_hardcoded_data.py

Python 脚本

这需要 Beautiful Soup 4。它将为文件夹中的所有 html 文件创建一个最小的 html 文件并附加到输出中（例如，如果输入为，则输出将是）。Quarto 已经创建了一个需要上传到服务器的文件夹，脚本将 json 数据复制到该文件夹。"_min""./page.html""./page_min.html""./page_files/"

通过将文件添加到函数的列表中，可以从此脚本中排除文件。files_to_excludemake_all_html_min()

#!/usr/bin/env python3
# coding: utf-8

from bs4 import BeautifulSoup
from pathlib import Path
import re

def load_page(page_path):
    with open(page_path, "r", encoding="utf-8") as f:
        soup = BeautifulSoup(f, "html.parser")
    return soup

# 1. Remove all script tags but keep their src
def get_script_links(soup):
    script_links = []
    for script in soup.findAll("script"):
        if script.has_attr("src"):
            script_links.append(script.attrs["src"])
            script.decompose()
    return script_links

# 2. Move quarto-html-after-body and any other scripts to files
#    so they're not loaded before htmlwidgets etc. are loaded
def get_body_scripts(soup, page_name):
    body_scripts = []
    for i, script in enumerate(soup.html.body.findAll("script")):
        # don't copy the data scripts here
        if not script.has_attr("data-for"):
            if script.has_attr("id"):
                out_file = f"./{page_name}_files/libs/{script.attrs['id']}.js"
            else:
                out_file = f"./{page_name}_files/libs/body_script_{i}.js"
            with open(out_file, "w", encoding= "utf-8") as f:
                f.write(script.get_text())             
            body_scripts.append(out_file)
            script.decompose()
    return body_scripts

# 3. Remove the hardcoded json data and write to file
def remove_json_data(json_tag, page_name):
    Path(f"./{page_name}_files/data/").mkdir(exist_ok=True)
    el_id = json_tag.attrs['data-for']
    with open(f"./{page_name}_files/data/{el_id}.json", "w", encoding="utf-8") as f:
        f.write(json_tag.get_text()) 
    json_tag.string.replace_with("")
    return el_id

# 4. Create the javascript to load the data and scripts
def create_load_data_js(soup, page_name):
  script_links = get_script_links(soup)
  body_scripts = get_body_scripts(soup, page_name)
  json_tags = [script for script in soup.findAll("script") if script.has_attr("data-for")]
  el_ids = [remove_json_data(json_tag, page_name) for json_tag in json_tags]   

  load_function = """
    const loadScript = (file_url, async = true, type = "text/javascript", appendToHead = true) => {
        return new Promise((resolve, reject) => {
            try {
                const scriptEle = document.createElement("script");
                scriptEle.type = type;
                scriptEle.async = async;
                scriptEle.src = file_url;
                scriptEle.addEventListener("load", (ev) => {
                    resolve({ status: true });
                });
                scriptEle.addEventListener("error", (ev) => {
                    reject({
                        status: false,
                        message: `Failed to load the script ${file_url}`
                    });
                });
                appendToHead ? document.head.appendChild(scriptEle) : document.body.appendChild(scriptEle);
            } catch (error) {
                reject(error);
            }
        });
    };
  """

  load_data_first_element = f"""
  fetch("./{page_name}_files/data/{el_ids[0]}.json")
    .then((response) => response.json())
    .then(
      (json) =>
        (document.querySelectorAll('[data-for="{el_ids[0]}"]')[0].innerHTML =
          JSON.stringify(json).replaceAll("/", "/"))
    )
  """

  load_data_all_elements = [f"""
      .then(() => fetch("./{page_name}_files/data/{el_id}.json"))
      .then((response) => response.json())
      .then(
        (json) =>
          (document.querySelectorAll('[data-for="{el_id}"]')[0].innerHTML =
            JSON.stringify(json).replaceAll("/", "/"))
      )
    """ for el_id in el_ids]

  if(len(el_ids) > 1):
    load_data_all_elements.pop(0)
    load_data_next_elements = "".join(load_data_all_elements)
  else:
    load_data_next_elements = ""

  then_load_scripts = "\n".join([f'.then(() => loadScript("{script}"))' for script in script_links])
  then_body_scripts = "\n".join([f'.then(() => loadScript("{script}"))' for script in body_scripts])
  then_render_mermaid = ".then(() => window.mermaid.init())" # mermaid charts will not render otherwise
  then_render_html = ".then(() => window.HTMLWidgets.staticRender());"

  script_content = f"""
  {load_function}
  {load_data_first_element}
  {load_data_next_elements}
  {then_load_scripts}
  {then_body_scripts}
  {then_render_mermaid}
  {then_render_html}
  """
  return script_content

def insert_main_js_script(soup, page_name):
    load_data_js = create_load_data_js(soup, page_name)
    s = soup.new_tag("script")
    s.string = load_data_js 
    soup.html.head.append(s)   

def save_new_html(soup, page_name):
    outfile = f"{page_name}_min.html"
    with open(outfile, "w", encoding='utf-8') as file:
        file.write(str(soup))
    print(f"File created: {outfile}")

def create_page_min(page_path):
    soup = load_page(page_path)
    page_name = re.sub("\\.html$", "", page_path.name)
    print(f"Converting {page_path}")
    insert_main_js_script(soup, page_name)
    save_new_html(soup, page_name)

def make_all_html_min(files_to_exclude = ["example_file_to_exclude.html"]):
    # .endswith("min") is quick and dirty shortcut to not apply this script to files it creates
    files_to_make_min = [f for f in Path("./").glob("*.html") if not f.name.endswith("min.html")]
    files_to_make_min = list(set(files_to_make_min) - set([Path(f) for f in files_to_exclude]))
    for page_path in files_to_make_min:
        create_page_min(page_path)

make_all_html_min()

在没有任何答案的情况下继续尝试后，我得出了这一点。一旦我解决了这个问题，我决定自己回答这个问题，以防其他人遇到这个问题。但是，我不能将赏金授予自己，所以我对其他解决方案持开放态度。

上一个：来自 Rmarkdown 的 pdf - landscape 和 aspectratio=169

下一个：R 中 keras layer_dense的稀疏矩阵输入

从 json 加载 Quarto html map 数据，用于在 R 中生成的 Leaflet map

Load Quarto html map data from json for Leaflet map generated in R

评论

Python 脚本