HTML 解析：从内部标记获取内容-解网

问：

测试输入文件：

# cat test.html 
<div>line 1<div>Another 1</div></div>
<div>line 2<div>Another 2</div></div>
<div>line 3<div>Another 3</div></div>

预期输出：

Another 1
Another 2
Another 3

脚本：

#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;

# $tree->ignore_ignorable_whitespace(0);
# $tree->no_space_compacting(1)

$tree->parse_file("test.html");

foreach my $a ($tree->find("div")) 
{
  print $a->as_text."\n";
}

脚本输出：

line 1Another 1
Another 1
line 2Another 2
Another 2
line 3Another 3
Another 3

问题：我正在寻求帮助，仅从内部标签中提取内容。我的脚本首先输出，然后输出.但是，我只对 .divline 1Another 1Another 1Another 1

我尝试玩弄ignore_ignorable_whitespace和no_space_compacting（如脚本评论所示），但没有奏效。要么我没有正确使用它，要么我吠错了树。

html perl html 解析

HTML 解析：从内部标记获取内容

HTML Parsing: Get content from inner tags

评论