使用 if-else 为数据类赋值时,Selenium 获取重复项

Selenium getting duplicate item when use if-else to assign value to a dataclass

提问人:Dung8466 提问时间:10/1/2023 最后编辑:Dung8466 更新时间:10/3/2023 访问量:34

问:

我试图从 wepsite 中抓取产品(名称、价格、图片、url),但因为价格值可能是字符串货币(例如:1.000.000đ)或“Giá Liên Hệ”,所以我想检查 price = “Giá Liên Hệ” 然后 price = 0,但它似乎返回重复的项目。

我的数据类(可能需要将current_price更改为十进制):

@dataclass
class Item:
  url: str
  name: str
  current_price: str
  place: str
  img: str
  date_add: datetime.datetime

我的功能:

def cellphones(query):
  lists = []
  search = query.replace(' ', '%20')
  url = f"https://cellphones.com.vn/catalogsearch/result?q={search}"
  driver.get(url)
  content = driver.find_element(By.CSS_SELECTOR, "div[id*='search-catalog-page']")
  items = content.find_elements(By.CSS_SELECTOR, "div[class*='product-info']")
  for _ in items:
    if _.find_element(By.CSS_SELECTOR, "p[class*='product__price--show']").text == "Giá Liên Hệ":
      item = Item(
        url=_.find_element(By.CSS_SELECTOR, "a").get_attribute('href'),
        name=_.find_element(By.CSS_SELECTOR, "h3").text,
        current_price=Decimal("0"),
        place="Cellphones",
        img=_.find_element(By.CSS_SELECTOR, "img").get_attribute('src'),
        date_add=datetime.datetime.now()
      )
    else:
      item = Item(
        url=_.find_element(By.CSS_SELECTOR, "a").get_attribute('href'),
        name=_.find_element(By.CSS_SELECTOR, "h3").text,
        current_price=Decimal(_.find_element(By.CSS_SELECTOR, "p[class*='product__price--show']").text.replace("₫", "").replace(".","").replace(" ","")),
        place="Cellphones",
        img=_.find_element(By.CSS_SELECTOR, "img").get_attribute('src'),
        date_add=datetime.datetime.now()
      )
    item, created = Product.objects.update_or_create(
        name=item.name,
        place=item.place,
        defaults={
            'current_price': item.current_price,
            'url': item.url,
            'img': item.img,
            'date_add': item.date_add
        }
    )
    lists.append(item)
  return lists

我的模板:

{% block content %}

<p>You searched '<span>{{context.name}}</span>'</p>
<p>Return {{lists|length}} products.</p>
<ul class="d-flex flex-row flex-wrap align-content-center justify-content-around align-items-center list-unstyled">
    {% for p in lists %}
        <li>
            <div class="card h-100 mt-2 border-dark mb-3" style="max-width: 18rem;">
                <img src="{{p.img}}" class="card-img-top" style="height:17.813em;width:auto;" alt="product image">
                <div class="card-body">
                    <h5 class="card-title">{{p.name}}</h5>
                    {% if p.current_price == 0 %}
                        <p class="card-text">Giá Liên Hệ</p>
                    {% else %}
                        <p class="card-text">{{p.current_price|intcomma}}</p>
                    {% endif %}
                    <p class="card-text">Store: {{p.place}}</p>
                    <p class="card-text">Add at {{p.date_add}}</p>
                    <a href="{{p.url}}" class="btn btn-primary">Go to store</a>
                </div>
            </div>
        </li>
    {% endfor %}
</ul>{% endblock %}
python selenium-webdriver css-选择器

评论

0赞 John Gordon 10/1/2023
但它似乎返回重复的项目这表明页面上有实际的重复项目。有吗?
0赞 Dung8466 10/2/2023
你好!这是重复项目的屏幕截图 imgur.com/a/OaHlltp。我也会更新帖子以包含模板。
0赞 John Gordon 10/2/2023
我想你误解了我的问题。cellphones.com.vn 页面上是否有重复的项目?
0赞 Dung8466 10/2/2023
哦对不起。手机页面上没有重复的项目。搜索仅显示我抓取并存储到数据库的内容,它不会显示数据库中的项目
0赞 John Gordon 10/2/2023
如果删除数据库中的所有对象并运行一次代码,它是否仍会创建重复项?还是仅在再次运行代码时才创建重复项?Product

答:

0赞 Dung8466 10/3/2023 #1

好的,我放弃了为此寻找干净的解决方案,所以我只是检查 item.name 是否存在,如果没有,则将其添加到列表中

def cellphones(query):
  lists = []
  search = query.replace(' ', '%20')
  url = f"https://cellphones.com.vn/catalogsearch/result?q={search}"
  driver.get(url)
  content = driver.find_element(
    By.CSS_SELECTOR, "div[id*='search-catalog-page']")
  items = content.find_elements(
    By.CSS_SELECTOR, "div[class*='product-info']")
  print(items)
  for _ in items:
    try:
        a = _.find_element(By.CSS_SELECTOR, "a").get_attribute('href')
        b = _.find_element(By.CSS_SELECTOR, "h3").text
        c = _.find_element(By.CSS_SELECTOR, "p[class*='product__price--show']").text
        d = _.find_element(By.CSS_SELECTOR, "img").get_attribute('src')
        e = datetime.datetime.now(tz=timezone.utc)
        if c == "Giá Liên Hệ":
            c = Decimal("0")
        else:
            c = Decimal(c.replace("₫", "").replace(".","").replace(" ",""))
        newItem = Item(
            url=a,
            name=b,
            current_price=c,
            place="Cellphones",
            img=d,
            date_add=e
        )
        print(newItem)
        item, created = Product.objects.update_or_create(
            name=newItem.name,
            place=newItem.place,
            defaults={
                'current_price': newItem.current_price,
                'url': newItem.url,
                'img': newItem.img,
                'date_add': newItem.date_add
            }
        )
        print(item)
        print(created)
        if any(obj.name == item.name for obj in lists):
            pass
        else:
            lists.append(item)
    except:
        pass
  return lists