修改此正则表达式以以任意顺序匹配 img src、alt 和 title [duplicate]-解网

问：

如何使用 php 从 html 中提取 img src、title 和 alt？[复制] （10 个答案）

如何在 PHP 中解析和处理 HTML/XML？（31 个答案）

上个月关闭。

我有这个正则表达式来匹配图像标签和 or 标签，但它仅在 src 是第一个时才有效，我应该如何修改它以匹配这 3 个？或者有没有更准确的方法可以通过解析 html 元素来做到这一点？我假设通过正则表达式，我可能会取回数组元素，但不知道哪个是什么。srcalttitle

例如，现在它匹配：

img src="landscape.jpg" title="My landscape"

但不是

img title="My landscape" src="landscape.jpg"

当前正则表达式为：

preg_match_all('#<img\s+[^>]*src="([^"]*)"(?:\s+[^>]*(?:alt|title)="([^"]+)")?[^>]*>#is', $url_contents, $image_matches);

php 正则表达式解析 html-parsing

如果您按照@InSync提到的问题的答案进行操作，并且希望按任意顺序获取所有三个属性，其中两个是可选的，则可以将两个捕获组包装到可选组内的 lookahead 中，并且通常匹配 .类似于这个 regex101 演示。这些正则表达式通常效率不高或不符合 html，而且它们很快就会变大。这就是为什么推荐使用 html 解析器的原因，它可以做得更好。src

0赞 bobble bubble 9/29/2023

另请参阅→如何使用 php 从 html 中提取 img src、title 和 alt

1赞 adrianTNT 9/29/2023

InSync ，@bobblebubble谢谢，我最终改用了 DOMDocument（我发布了我自己问题的答案），似乎比正则表达式好得多:D

答：

2赞 MonkeyZeus 9/29/2023 #1

您可以使用：

(?<=<img)(?: (src|title|alt)="([^"]+)")?(?: (src|title|alt)="([^"]+)")?(?: (src|title|alt)="([^"]+)")?

(?<=<img)- 在我身后是一个开始标签<img
(?: (src|title|alt)="([^"]+)")?- 查找 src、title 或 alt 属性，后跟其值，并将它们放入捕获组中
(?: (src|title|alt)="([^"]+)")?-再
(?: (src|title|alt)="([^"]+)")?-再

https://regex101.com/r/GXyAZf/1

1赞 adrianTNT 9/29/2023 #2

我找到了一个使用 DOMDocument 的简单示例：它只做了我想要的，而且似乎比我通过正则表达式尝试的更可靠。

<?php

$dom = new DOMDocument();
      
// Loading HTML content in $dom
@$dom->loadHTMLFile($url);
  
// Selecting all image i.e. img tag object
$anchors = $dom -> getElementsByTagName('img');
  
// Extracting attribute from each object
foreach ($anchors as $element) {
      
    // Extracting value of src attribute of
    // the current image object
    $src = $element -> getAttribute('src');
      
    // Extracting value of alt attribute of
    // the current image object
    $alt = $element -> getAttribute('alt');
      
    // Extracting value of height attribute
    // of the current image object
    $height = $element -> getAttribute('height');
      
    // Extracting value of width attribute of
    // the current image object
    $width = $element -> getAttribute('width');
      
    // Given Output as image with extracted attribute,
    // you can print value of those attributes also
    echo '<img src="'.$src.'" alt="'.$alt.'" height="'. $height.'" width="'.$width.'"/>';
}
    
?>

上一个：XSLT 字符串到节点的转换和 disable-output-escaping

下一个：使用 Python 替换 HTML 文档中的 HTML 标记，而不修改文档的其余部分

修改此正则表达式以以任意顺序匹配 img src、alt 和 title [duplicate]

Modifying this regex to match img src, alt and title in any order [duplicate]

评论