在 Python2.7 中使用 xpath 解析 html

Parsing html using xpath in Python2.7

提问人:coucou 提问时间:6/20/2017 最后编辑:J-Wincoucou 更新时间:6/20/2017 访问量:157

问:

我正在尝试在 Python2.7 和 Flask0.12 中解析一些 html 代码 (carthtml)。'carthtml' 中有 3 个不同的项目。我尝试使用 xpath 将所有这些项目放入“def getCart()”中的“项目”中。

我想一个接一个地打印每个项目的名称,所以我使用了 for 循环

for idx, item in enumerate(items):
    product_name = get_value_by_xpath(item,
                                    '//div[@class="product-name"]/a/text()')
    print product_name

预期输出为:

Piqué Polo Romper
Neon Little Brother Jumpsuit
OshKosh Mary Jane Sneakers

但我的实际输出是:

Piqué Polo Romper
Piqué Polo Romper
Piqué Polo Romper

我想在 for 循环中,“item”不会接受 1 个项目,而是每次重复时所有 3 个项目。任何帮助将不胜感激。

这是我的代码

app.py

# -*- coding: utf-8 -*-
from flask import Flask, request
from lxml import html
from lxml import etree
from datetime import datetime
import traceback
import requests
import sys
import logging.handlers


reload(sys)
sys.setdefaultencoding('utf-8')

app = Flask(__name__)

@app.route('/')
def getCart():    
    html_tree = parse_htmlpage(carthtml)
    items = get_elements_by_xpath(html_tree, '//div[@class="primary-content"]//div[@class="mini-cart-product clearfix"]')
    product_list = []
    if items is False or items is None:
        #logger.debug("[T/F:T, e_id:" + str(e_id) + ", API_URL:/add_to_cart, Msg:No data from CARTERS]")
        return jsonify({'result_code': 0})
    if len(items) > 0:
        for idx, item in enumerate(items):
            product_name = get_value_by_xpath(item,
                                           '//div[@class="product-name"]/a/text()')
            print product_name

    return


def parse_htmlpage(html_src):
    detail_html = html.fromstring(html_src)
    page_tree = etree.ElementTree(detail_html)

    return page_tree

def get_elements_by_xpath(page_tree, target_xpath):
    target_value_list = page_tree.xpath(target_xpath)
    return target_value_list

def get_value_by_xpath(page_tree, target_xpath):
    target_value = page_tree.xpath(target_xpath)
    return target_value

carthtml

<div class="primary-content">
<div class="mini-cart-product clearfix">
    <div class="mini-cart-image">
        <a href="/carters-baby-boy-one-pieces/190795039832.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw182a85c8/hi-res/118H023_Default.jpg?sw=470" alt="Piqué Polo Romper" title="Piqué Polo Romper"></a>
    </div>

    <div class="mini-cart-attributes">
        <div class="product-name">
            <a href="/carters-baby-boy-one-pieces/190795039832.html">Piqué Polo Romper</a>
        </div>
    </div>
</div>


<div class="mini-cart-product clearfix">
    <div class="mini-cart-image">
        <a href="/carters-baby-boy-one-pieces/190795419986.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw540ec9a5/hi-res/127G525_Default.jpg?sw=470" alt="Neon Little Brother Jumpsuit" title="Neon Little Brother Jumpsuit"></a>
    </div>


    <div class="mini-cart-attributes">
        <div class="product-name">
            <a href="/carters-baby-boy-one-pieces/190795419986.html">Neon Little Brother Jumpsuit</a>
        </div>
    </div>
</div>


<div class="mini-cart-product clearfix">

    <div class="mini-cart-image">
        <a href="/oshkosh-baby-girl-shoes-casual-shoes/888737142503.html"><img src="https://www.carters.com/dw/image/v2/AAMK_PRD/on/demandware.static/-/Sites-carters_master_catalog/default/dw32891bda/hi-res/OF150011_Navy.jpg?sw=470" alt="OshKosh Mary Jane Sneakers" title="OshKosh Mary Jane Sneakers"></a>
    </div>

    <div class="mini-cart-attributes">

        <div class="product-name">
            <a href="/on/demandware.store/Sites-Carters-Site/default/RedirectURL-CookieMigration?url=https%3a%2f%2fwww%2ecarters%2ecom%2fs%2fSites-Carters-Site%2fdw%2fshared_session_redirect%3furl%3dhttps%253A%252F%252Fwww%2eoshkosh%2ecom%252Foshkosh-baby-girl-shoes-casual-shoes%252F888737142503%2ehtml%253Fsrd%253Dtrue">OshKosh Mary Jane Sneakers</a>
        </div>
    </div>
</div>
</div>
python-2.7 xpath html 解析

评论

0赞 corn3lius 6/20/2017
会不会是失踪者?return target_value
0赞 coucou 6/20/2017
@corn3lius 哦,这只是我的错误。我添加了它。谢谢:)但这不是问题所在。
0赞 Andersson 6/20/2017
尝试。另请注意,您的条件永远不会返回 True,因为空或不空不能是 或 !product_name = get_value_by_xpath(item, './/div[@class="product-name"]/a/text()')if items is False or items is NonelistNoneFalse
0赞 coucou 6/21/2017
@Andersson 非常感谢。这是xpath问题。我将其编辑为“//div[@class ...”添加到“.//div[@class...”,并将其固定:)

答: 暂无答案