XSLT 字符串到节点的转换和 disable-output-escaping-解网

问：

我正在将 Access 数据库导出为 XML，然后需要转换 XML 以准备数据，以便 Framemaker 创建出版物。在此过程中，我需要在 Access 输出的文本中创建交叉引用。目前，我的方法是插入 <idref>some text</idref>，但是当导出到 XML 时，这变成了 <一些文字>，正因为如此，我使用的 html 解析器并没有像我希望的那样将文本转换为节点。我有 Saxon EE 9.8.3，但尚未测试 html-parse，因为上一个问题中提供的解决方案能够解决我最初的解析问题。（使用 parse-xml 将文本转换为 XML 中的节点)

以下是 XML 输入的一个版本：

<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns:od="urn:schemas-microsoft-com:officedata" generated="2023-09-29T08:29:47">
<TEQuery>
<IntID>PR090F</IntID>
<TEName>Exempt Lease From Taxable Owner</TEName>
<Description>
&lt;div&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;&amp;nbsp;Leased &amp;lt;idref&amp;gt;PR001F&amp;lt;/idref&amp;gt; properties that qualify for this exemption are reported under one of the following expenditures: &lt;/font&gt;&lt;/div&gt;

&lt;ul&gt;
 &lt;ul&gt;
  &lt;ul&gt;
   &lt;ul&gt;
    &lt;ul&gt;
     &lt;ul&gt;
      &lt;ul&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;&amp;lt;idref&amp;gt;PR001F&amp;lt;/idref&amp;gt;, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR007F, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR079F, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR083F, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR085F, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR086F, &lt;/font&gt;&lt;/li&gt;
       &lt;li&gt;&lt;font face=&quot;Times New Roman&quot; color=black&gt;PR087F, .&lt;/font&gt;&lt;/li&gt;
      &lt;/ul&gt;
     &lt;/ul&gt;
    &lt;/ul&gt;
   &lt;/ul&gt;
  &lt;/ul&gt;
 &lt;/ul&gt;
&lt;/ul&gt;</Description>
<TaxSort>2</TaxSort>
</TEQuery>
</dataroot>

我想要的输出是：

<dataroot xmlns:od="urn:schemas-microsoft-com:officedata"
          generated="2023-09-26T10:37:15">

   <TaxExpenditure id="PR090F" TAXSORT="2">Exempt Lease From Taxable Owner
      <Description>
Leased <idref>PR001F</idref> properties that qualify for this exemption are reported under one of the following expenditures:
<unorderedList>
            <listitem><idref>PR001F</idref>, </listitem>
            <listitem>PR007F, </listitem>
            <listitem>PR079F, </listitem>
            <listitem>PR083F, </listitem>
            <listitem>PR085F, </listitem>
            <listitem>PR086F, </listitem>
            <listitem>PR087F, </listitem>
         </unorderedList>
   </TaxExpenditure>
</dataroot>

我实际上有一个“解决方案”，因为我可以使用禁用输出转义，但我在其他地方看到这是一个更高级的工具，而且通常它不是理想的解决方案。这是解决我问题的最佳方法吗？

*请注意，我使用的是 David Carlisle 的 htmlparse，它可以与 XSLT 2.0 一起使用。

这是我正在使用的 XSLT：

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:dc="data:,dpc"
  exclude-result-prefixes="#all">
  
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

<xsl:import href="https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse/htmlparse.xsl"/>

<xsl:mode on-no-match="shallow-copy"/>

<xsl:template match="TEQuery">
    <TaxExpenditure>
      <xsl:attribute name="id" select="IntID"/>
      <xsl:attribute name="TAXSORT" select="TaxSort"/>
      <xsl:value-of select = "TEName"/>
      <xsl:apply-templates select="@* | node()" />
    </TaxExpenditure>
</xsl:template>

<xsl:template match="Description">
  <xsl:copy>
    <xsl:apply-templates select="dc:htmlparse(., '', true())"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="li">
  <listitem>
    <xsl:value-of disable-output-escaping = "yes"   select="."/>
  </listitem>
</xsl:template>

<xsl:template match="idref">
  <idref>
   <xsl:apply-templates/>
  </idref>
</xsl:template>

<xsl:template match="ul[ul] | font | div">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="ul[not(ul)]">
  <unorderedList>
    <xsl:apply-templates/>
  </unorderedList>
</xsl:template>

<xsl:template match="IntID"/>
<xsl:template match="TaxSort"/>
<xsl:template match="TEName"/>
</xsl:stylesheet>

XML XSLT HTML 解析

你说“我的方法是插入<idref>some text</idref>”，但不清楚你要插入什么，或者如何插入。如果这最终被转义了，听起来就像您正在将词法 XML 作为文本插入到解析的 XML 树结构中。如果您有一个解析的树结构，则应该在其中插入节点，而不是词法 XML 字符串。

0赞 MadeFrame 9/30/2023

因此，在 Access 数据库中，我需要能够允许用户添加某种文本，这些文本在转换为 XML 然后转换后最终将作为 xml 节点。然后，我添加到数据库中的单元格并引用的确切文本在导出为 XML 时将转换为我的输入 XML 中显示的内容。我想我理解你所说的词法XML与节点，但由于我通过Access拥有的工作流程，我不知道有什么方法可以解决这个问题，所以我认为目标是将词法字符串转换为节点。

答：

1赞 Siebe Jongebloed 9/30/2023 #1

您的 xslt 已经产生了预期的结果。但我看到了 2 种选择

似乎 dc：htmlparse 可以处理各种（无效的）html，那么为什么不像这样再次调用 dc：htmlparse：<xsl:template match="li">

  <xsl:template match="li">
    <listitem>
      <xsl:apply-templates select="dc:htmlparse(., '', true())"/>
    </listitem>
  </xsl:template>

如果您确定要插入的是有效的 xml，您还可以使用：

  <xsl:template match="li">
    <listitem>
      <xsl:apply-templates select="parse-xml-fragment(.)"/>
    </listitem>
  </xsl:template>

编辑

此 xslt 将处理递归转义的 xml：

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="data:,dpc" exclude-result-prefixes="#all">
  
  <xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
  <xsl:import href="https://raw.githubusercontent.com/davidcarlisle/web-xslt/main/htmlparse/htmlparse.xsl"/>
  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:template match="TEQuery">
    <TaxExpenditure>
      <xsl:attribute name="id" select="IntID"/>
      <xsl:attribute name="TAXSORT" select="TaxSort"/>
      <xsl:value-of select = "TEName"/>
      <xsl:apply-templates select="@* | node()" />
    </TaxExpenditure>
  </xsl:template>
  
  <!-- This template does (optional recursively) what you need without the need of matching specific elements-->
  <xsl:template match="text()[contains(.,'&lt;')]">
    <xsl:apply-templates select="dc:htmlparse(., '', true())"/>
  </xsl:template>
  
  <xsl:template match="ul[ul] | font | div">
    <xsl:apply-templates/>
  </xsl:template>
  
  <xsl:template match="ul[not(ul)]">
    <unorderedList>
      <xsl:apply-templates/>
    </unorderedList>
  </xsl:template>
  
  <xsl:template match="IntID"/>
  <xsl:template match="TaxSort"/>
  <xsl:template match="TEName"/>
</xsl:stylesheet>

XSLT 字符串到节点的转换和 disable-output-escaping

XSLT string to node conversion and disable-output-escaping

评论

评论