用于在文本中查找链接的正则表达式-解网

问：

请帮我，请编写正则表达式以查找所有链接（.com|。org|。ru）在文本中没有标记 <a>。

Example text:
 1. https://www.cyberforum.ru/newthread.php?do=newthread&f=323
 2. www.cyberforum.ru
 3. <a href="https://www.cyberforum.ru/newthread.php?do=newthread&f=323">www.cyberforum.ru/newthread.php?do=newthread&f=323</a>
 4. <a href="www.cyberforum.ru/newthread.php?do=newthread&f=323">www.cyberforum.ru/newthread.php?do=newthread&f=323</a>

项目 1,2 应与正则表达式匹配，但 3,4 - 否。
我试过/（？<！[“'<>]）（\b（https？：//）？（[\w.]（com|org|ru）[\w.？&=/]）\b）/ 但它不能正常工作。

PHP 正则表达式匹配

/**
 * Wraps links in <a></a> tag.
 * Skip links which are in href or in <a></a> tag already.
 *
 * @param string $text
 * @return string
 */
private static function replaceLinks(string $text): string
{
    return preg_replace_callback(
        '/\b(https?:\/\/)?([\w.-]*(\.com|\.org|\.ru|\.local)[\w.?&=\/]*)\b/',
        function ($matches) use ($text) {
            // checks previous char, skip links which are in href or in <a></a> tag
            $previousChar = $matches[0][1] > 0 ?  $text[--$matches[0][1]] : '';
            if (!in_array($previousChar, ['"', '\'', '<', '>', ';'])) {
                return "<a target='_blank' href=\"{$matches[0][0]}\">{$matches[0][0]}</a>";
            }

            // without replace
            return $matches[0][0];
        },
        $text,
        -1,
        $cont,
        PREG_OFFSET_CAPTURE
    );
}

0赞 ThW 2/28/2022 #2

这是一种将 DOM 与正则表达式相结合的方法。它限制了对内部元素节点的文本内容的更改，并避免修改其他节点，如注释或属性。body

会发生什么情况：

遍历文本节点，避免使用现有的链接元素/html/body//text()[not(ancestor::a)]
使用 preg_split（）通过匹配 http（s） URL 来分隔文本
遍历该列表并将它们（添加到片段）作为链接（如果它是 URL）或作为文本节点（如果不是）。
将原始文本节点替换为新片段。

$html = <<<'HTML'
<html>
  <body>
    Some link http://example.tld to replace.
    <div>Another link http://example.tld/another to replace.</div>
    <a href="http://example.tld/in-link">http://example.tld/in-link</a>
  </body>
</html>
HTML;

$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);


$linkPattern = '\b(?:https?:\/\/)(?:[\w.?&=\/-]*)\b';
$splitPattern = '(('.$linkPattern.'))'; 
$matchPattern = '(^'.$linkPattern.'$)';

// iterate over text nodes inside the body
$expression = '/html/body//text()[not(ancestor::a)]';
foreach ($xpath->evaluate($expression) as $textNode) {
    // split the text content at the search string and capture any part
    $parts = preg_split(
        $splitPattern, 
        $textNode->textContent, 
        -1, 
        PREG_SPLIT_DELIM_CAPTURE
    );
    // here should be at least two parts
    if (count($parts) < 2) {
        continue;
    }
    // fragments allow to treat several nodes like one
    $fragment = $document->createDocumentFragment();
    foreach ($parts as $part) {
        // it's an URL
        if (preg_match($matchPattern, $part)) {
            // create the new a
            $fragment->appendChild(
                $a = $document->createElement('a')
            );
            $a->setAttribute('href', $part);
            $a->textContent = $part;
        } else {
            // add the part as a new text node
            $fragment->appendChild($document->createTextNode($part));
        }   
    }
    // replace the text node with the fragment
    $textNode->parentNode->replaceChild($fragment, $textNode);
}

echo $document->saveHTML();

上一个：正则表达式捕获特定单词后面的大括号

下一个：PHP 将循环中的字符串与 similar_text 进行比较，但仅在匹配是唯一时才显示

用于在文本中查找链接的正则表达式

Regular expression for find links in text

评论