提问人:ThomasB 提问时间:12/16/2019 更新时间:12/16/2019 访问量:47
simple_html_dom使用数据属性解析问题
simple_html_dom parsing question with data attributes
问:
一段时间以来,我一直在为这个问题而苦苦挣扎。我正在尝试解析一个具有许多 div 标签的 html 文档,在这些 div 标签中是其他 div 标签,它们具有一些我需要解析的数据属性。
但是,我确实需要保留div class=“row”的原始循环。这是无法改变的。
$test_html = '
<div class="row">
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
</div>
<div class="row">
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
<div class="item">
<div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
<div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
<div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
<div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
</div>
</div>
';
$html = new simple_html_dom();
$html->load($test_html, true, false);
foreach($html->find('div.row') as $a) {
$item['book']['data'] = $a->find('div.item',0)->outertext;
$item['book']['div1']['data-op-info'] = "someinfo1"; // desired output
$item['book']['div2']['data-op-info'] = "someinfo2"; // desired output
$item['book']['div3']['data-op-info'] = "someinfo3"; // desired output
$item['book']['div4']['data-op-info'] = "someinfo4"; // desired output
//$item['book']['div1'] = $a->find('div.item',1)->outertext;
//$item['book']['div2'] = $a->find('div.item',2)->outertext;
//$item['book']['div3'] = $a->find('div.item',3)->outertext;
$data[] = $item;
}
print_r($data);
我希望有人能够帮助我,一段时间以来一直试图解决这个问题。
答:
0赞
Joffrey Schmitz
12/16/2019
#1
您可以查询节点上的子节点,并使用该函数来获得所需的内容。item
getAttribute()
在这里,我假设节点的顺序始终相同。也许您需要将选择器更改为更具体的内容:
$itemNode = $a->find('div.item',0) ;
$item['book']['data'] = $itemNode->outertext;
$item['book']['div1']['data-op-info'] = $itemNode->find('div.op-item',0)->getAttribute('data-op-info') ;
$item['book']['div2']['data-op-info'] = $itemNode->find('div.op-item',1)->getAttribute('data-op-info');
$item['book']['div3']['data-op-info'] = $itemNode->find('div.op-item',2)->getAttribute('data-op-info');
$item['book']['div4']['data-op-info'] = $itemNode->find('div.op-item',3)->getAttribute('data-op-info');
评论