如何使用 HTML：:P arser 解析内容-解网

问：

我有具有不同 URL 的网页。
我创建了脚本来通过Perl模块从页面获取URL。WWW::Mechanize

my @links = $mech->find_all_links( text_regex => qr/client_update/i  );
   foreach (@links) {
        push (@new_arr, $_->url() ,"\n");
      }

现在我应该只得到灰色的 URL，检查带有属性和值的标签名称：

<td class="highlight-grey" data-highlight-colour="**grey**"><a href="http://cache.download.it/download/soker/client_update.php">cache.download.it/download/soker/client_update.php</a></td>

顺便说一句，我无法为我的任务安装像“HTML：：TreeBuilder”这样的模块。

html perl html-解析 www-mechanize

好的，现在有问题了。好吧，链接在 td 中，所以使用 find_all_links，您需要执行的信息会以我的方式丢失。您必须首先获得所有正确的 td，然后获取其中包含的所有链接。我一直使用 Mechanize + TreeBuilder，所以不确定如何使用 HTML：:P arser 来达到这种效果，所以我建议重新措辞您的问题，以便更了解它的人可以更清楚地看到您想要什么。另外，请考虑使用 local：：lib 在本地安装 TreeBuilder。

0赞 Quentin 8/26/2015

perlmonks.org/?node=693828

0赞 ostapv 8/27/2015

找到解决方案：

答：

0赞 ostapv 8/27/2015 #1

    Have found solution:



 my $webpage = {
        username    =>  'xx',
        password    =>  'xx',
        url         =>  'xx',
        form_name   =>  'loginform',
        regex       =>  qr/data-highlight-colour="grey"><a\s+href="/xsm,
    };

    my @links = $mech->find_all_links(text_regex => qr/client_update_urls/i);
        my @raw_content = $mech->response()->content();

        my @prod_urls;
        foreach my $line (@raw_content) {
            foreach my $link (@links) {
                my $url = $link->url();
                if ($line !~ /$webpage->{regex}$url/xsm) {
                    push @prod_urls, $url;
                }
            }
        }

        for (@prod_urls) {print "$_\n"}

上一个：HTML：：TokeParser 无法从 CPAN 安装

下一个：HTML 解析：从内部标记获取内容

如何使用 HTML：:P arser 解析内容

How to parse content with HTML::Parser

评论