包含 HTML 标记的字符串上标题大小写的正则表达式问题-解网

问：

目前，我正在运行以下替换方法...

const str = '<span style="font-weight:bold;color:Blue;">ch</span>edilpakkam,tiruvallur';
const rex = (/(\b[a-z])/g);
 
const result = str.toLowerCase().replace(rex, function (letter) {
  //console.log(letter.toUpperCase())
  return letter.toUpperCase();
});

console.log(result);

.as-console-wrapper { min-height: 100%!important; top: 0; }

...来源 ...

<span style="font-weight:bold;color:Blue;">ch</span>edilpakkam,tiruvallur

...以及以下结果......

<Span Style="Font-Weight:Bold;Color:Blue;">Ch</Span>Edilpakkam,Tiruvallur

但我想实现的是以下几点......

将 span 绑定到字符串。
大写第一个字母和后面的单词。
预期输出

<span style="font-weight:bold;color:Blue;">Ch</span>edilpakkam,Tiruvallur

javascript 正则表达式 dom 替换 html 解析

你引用正确了。但是 OP 希望通过以某种方式“解析”一串 html 代码来实现此任务，例如 .您的方法采用一个字符串，例如处理大写任务，然后在非泛型过程中，精确地切片前 2 个字母，用预定义的 html 代码包装它们并附加字符串的其余部分。因此，如果 OP 想要处理类似（而不是和样式更改）的代码，则此方法将失败。'chedilpakkam,tiruvallur''chedilpakkam, tiruvallur''chedilpakkam,tiruvallur'chech

1赞 Peter Seliger 3/10/2021 #2

Toto已经评论了通过正则表达式“解析”HTML代码的困难。

以下通用（与标记无关）方法使用类似沙盒的元素，以便从其 DOM 解析/访问功能中受益。div

首先，需要收集临时沙箱的所有文本节点。然后，对于每个文本节点的，必须决定是否从字符串开头的所有单词大写开始。textContent

将字符串中的每个单词（包括第一个出现的单词）大写的情况是......

文本节点的先前同级要么不存在......
...或者是块级元素。
文本节点本身以 whitespace（-sequence）开头。

对于所有其他情况，人们也希望捕获/大写单词边界后的每个第一个单词字符......除了一行开头的单词。

function collectContentTextNodesRecursively(list, node) {
  return list.concat(
    (node.nodeType === 1) // element-node?

    ? Array
      .from(node.childNodes)
      .reduce(collectContentTextNodesRecursively, [])

    : (node.nodeType === 3) // text-node?
      ? node
      : []
  );
}

function getNodeSpecificWordCapitalizingRegex(textNode) {
  const prevNode = textNode.previousSibling;
  const isAssumeBlockBefore = (prevNode === null) || (/^(?:address|article|aside|blockquote|details|dialog|dd|div|dl|dt|fieldset|figcaption|figure|footer|form|h1|h2|h3|h4|h5|h6|header|hgroup|hr|li|main|nav|ol|p|pre|section|table|ul)$/g).test(prevNode.nodeName.toLowerCase());

  //     either assume a previous block element, or the current text starts with whitespace.
  return (isAssumeBlockBefore || (/^\s+/).test(textNode.textContent))

    // capture every first word character after word boundary.
    ? (/\b(\w)/g)
    // capture every first word character after word boundary except at beginning of line.
    : (/(?<!^)\b(\w)/g);
}


function capitalizeEachTextContentWordWithinCode(code) {
  const sandbox = document.createElement('div');
  sandbox.innerHTML = code;

  collectContentTextNodesRecursively([], sandbox).forEach(textNode => {

    textNode.textContent = textNode.textContent.replace(
      getNodeSpecificWordCapitalizingRegex(textNode),
      (match, capture) => capture.toUpperCase()
    ); 
  });
  return sandbox.innerHTML; 
}


const htmlCode = [
  '<span style="font-weight:bold;color:blue;">ch</span>edilpakkam,tiruvallur, chedilpakkam,tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span> edilpakkam,tiruvallur, chedilpakkam,tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span> edilpakkam, tiruvallur,chedilpakkam, tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span>edilpakkam, tiruvallur,chedilpakkam, tiruvallur',
].join('<br\/>');

document.body.innerHTML = capitalizeEachTextContentWordWithinCode(htmlCode);

console.log(document.body.innerHTML.split('<br>'));

.as-console-wrapper { max-height: 57%!important; }

上一个：如何解析某个 <div> 标签内的所有 标签？

下一个：HTML 解析和 Dom 树构造

包含 HTML 标记的字符串上标题大小写的正则表达式问题

Regex Issue for Title Case on String Containing HTML Markup

评论

评论