simple_html_dom使用数据属性解析问题

simple_html_dom parsing question with data attributes

提问人:ThomasB 提问时间:12/16/2019 更新时间:12/16/2019 访问量:47

问:

一段时间以来,我一直在为这个问题而苦苦挣扎。我正在尝试解析一个具有许多 div 标签的 html 文档,在这些 div 标签中是其他 div 标签,它们具有一些我需要解析的数据属性。

但是,我确实需要保留div class=“row”的原始循环。这是无法改变的。


    $test_html = '
        <div class="row">
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
        </div>
        <div class="row">
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
            <div class="item">
                <div class="op-item op-spread" data-op-info="someinfo1" data-op-source="somesource1" data-op-status="some status">some content 1</div>
                <div class="op-item spread-price" data-op-info="someinfo2">some content b</div>
                <div class="op-item op-spread" data-op-info="someinfo3" data-op-status="somemoney3" data-op-source="somesource3">some content 3</div>
                <div class="op-item spread-price" data-op-info="someinfo4">some content 4</div>
            </div>
        </div>

        '; 

    $html = new simple_html_dom();
    $html->load($test_html, true, false);

    foreach($html->find('div.row') as $a) {

        $item['book']['data'] = $a->find('div.item',0)->outertext;
        $item['book']['div1']['data-op-info'] = "someinfo1"; // desired output
        $item['book']['div2']['data-op-info'] = "someinfo2"; // desired output
        $item['book']['div3']['data-op-info'] = "someinfo3"; // desired output
        $item['book']['div4']['data-op-info'] = "someinfo4"; // desired output

        //$item['book']['div1'] = $a->find('div.item',1)->outertext;
        //$item['book']['div2'] = $a->find('div.item',2)->outertext;
        //$item['book']['div3'] = $a->find('div.item',3)->outertext;

    $data[] = $item;

    }

    print_r($data);


我希望有人能够帮助我,一段时间以来一直试图解决这个问题。

php 解析 simple-html-dom

评论


答:

0赞 Joffrey Schmitz 12/16/2019 #1

您可以查询节点上的子节点,并使用该函数来获得所需的内容。itemgetAttribute()

在这里,我假设节点的顺序始终相同。也许您需要将选择器更改为更具体的内容:

$itemNode = $a->find('div.item',0) ;

$item['book']['data'] = $itemNode->outertext;
$item['book']['div1']['data-op-info'] = $itemNode->find('div.op-item',0)->getAttribute('data-op-info') ;
$item['book']['div2']['data-op-info'] = $itemNode->find('div.op-item',1)->getAttribute('data-op-info');
$item['book']['div3']['data-op-info'] = $itemNode->find('div.op-item',2)->getAttribute('data-op-info');
$item['book']['div4']['data-op-info'] = $itemNode->find('div.op-item',3)->getAttribute('data-op-info');