提问人:toscho 提问时间:9/7/2022 更新时间:9/7/2022 访问量:81
从外部页面抓取 DIV 中的特定元素
Grabbing specific elements inside DIV from external page
问:
我需要删除这些 div 中的每一个中的以下元素(页面包含其中的几个),但实际上我不知道该怎么做......所以,我需要帮助不要拔掉我的头发。class="product-grid-item"
1 - div 中的链接和图像:class="product-element-top2
;
<a href="https://...this_link" class="product-image-link">
(只需要链接)
<img width="300" height="300" src="https://...this_image_url...
(只需要这个图片网址)
2 - h3 标签内的标题,如下所示;
<h3 class="wd-entities-title"><a href="https://...linkhere">The title goes here
(只是标题)
3 - 最后但并非最不重要的一点是,我需要抓住这个价格;
<span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span>
(只是“20.00 欧元”)
以下是完整的 HTML
:
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://...linkhere" class="product-image-link">
<img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail"> </a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title"><a href="https://...linkhere">The title goes here</a></h3>
<span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<a href="https://...linkhere" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop"><span>Options</span></a></div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
<a href="https://...linkhere" data-added-text="Compare Products">Buy</a>
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
<a href="https://...linkhere" class="open-quick-view quick-view-button">quick view</a>
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon"><a href="#" rel="nofollow noopener">Close</a></div>
<div class="quick-shop-form wd-scroll-content">
</div>
</div>
</div>
</div>
我笨拙的尝试之一:
$html = file_get_contents("https://url-here.goetohere");
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'product-grid-item';
$classname = 'product-element-top2';
$classname = 'product-element-top2';
$classname = 'wd-entities-title';
$classname = 'price';
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $node) {
echo 'here »» ' . htmlentities($node->nodeValue) . '<br>';
}
答:
假设在尝试任何 DOM 处理之前正确获取了 HTML,那么构造一些基本的 XPath 表达式来查找指示的内容是相当简单的。
根据注释,您将在输出中注意到 2 个 div。page contains several of them
product-grid-item
$html='
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://...linkhere" class="product-image-link">
<img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
</a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title">
<a href="https://...linkhere">The title goes here</a>
</h3>
<span class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
<span class="woocommerce-Price-currencySymbol">€</span>20,00
</bdi>
</span>
</span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<a href="https://...linkhere" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
<span>Options</span>
</a>
</div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
<a href="https://...linkhere" data-added-text="Compare Products">Buy</a>
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
<a href="https://...linkhere" class="open-quick-view quick-view-button">quick view</a>
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
<a href="#" rel="nofollow noopener">Close</a>
</div>
<div class="quick-shop-form wd-scroll-content"></div>
</div>
</div>
</div>
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://www.example.com/banana" class="product-image-link">
<img width="300" height="300" src="https://www.example.com/kittykat.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
</a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title">
<a href="https://www.example.com/womble">Oh look, another title!</a>
</h3>
<span class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
<span class="woocommerce-Price-currencySymbol">€</span>540,00
</bdi>
</span>
</span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<a href="https://www.example.com/gorilla" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
<span>Options</span>
</a>
</div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
<a href="https:www.example.com/buy" data-added-text="Compare Products">Buy</a>
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
<a href="https://www.example.com/view" class="open-quick-view quick-view-button">quick view</a>
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://www.example.com/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
<a href="#" rel="nofollow noopener">Close</a>
</div>
<div class="quick-shop-form wd-scroll-content"></div>
</div>
</div>
</div>';
处理下载的 HTML
# set the libxml parameters and create new DOMDocument/XPath objects.
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->loadHTML( $html );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
# some basic XPath expressions
$exprs=(object)array(
'product-link' => '//a[@class="product-image-link"]',
'product-img-src' => '//a[@class="product-image-link"]/img',
'h3-title-text' => '//h3[@class="wd-entities-title"]',
'price' => '//span[@class="price"]/span/bdi'
);
# find the keys (for convenience) to be used below
$keys=array_keys( get_object_vars( $exprs ) );
# store results here
$res=array();
# loop through all patterns and issue XPath query.
foreach( $exprs as $key => $expr ){
# add key to output and set as an array.
$res[ $key ]=[];
$col=$xp->query( $expr );
# find the data if the query succeeds
if( $col && $col->length > 0 ){
foreach( $col as $node ){
switch( $key ){
case $keys[0]:$res[$key][]=$node->getAttribute('href');break;
case $keys[1]:$res[$key][]=$node->getAttribute('src');break;
case $keys[2]:$res[$key][]=trim($node->textContent);break;
case $keys[3]:$res[$key][]=trim($node->textContent);break;
}
}
}
}
# show the result or do really interesting things with the data
printf('<pre>%s</pre>',print_r($res,true));
其结果是:
Array
(
[product-link] => Array
(
[0] => https://...linkhere
[1] => https://www.example.com/banana
)
[product-img-src] => Array
(
[0] => https://image-goes-here.jpg
[1] => https://www.example.com/kittykat.jpg
)
[h3-title-text] => Array
(
[0] => The title goes here
[1] => Oh look, another title!
)
[price] => Array
(
[0] => â¬20,00
[1] => â¬540,00
)
)
评论
PHP
[0] »» $var_prod_link_0 = 'the link 0 here'; [1] »» $var_prod_link_1 = 'the link 1 here';
[0] »» $var_prod_img_0 = 'the image 0 link here'; [1] »» $var_prod_img_1 = 'the image 1 link here';
[0] »» $var_title_text_0 = 'the title 0 here'; [1] »» $var_title_text_1 = 'the title 1 here';
[0] »» $var_price_0 = 'the price 0 here'; [1] »» $var_price_1 = 'the price 1 here';
$res
$res
评论