如何从我从网络抓取中获得的数字中删除浮点数中的空白空间？错误：无法将字符串转换为浮点数：“1\xa0364”-解网

问：

在代码中，我尝试从网站获取价格数据。该网站在价格中使用了一个空白区域，并且 float 类引发了一个标志：无法将字符串转换为浮点数：“1\xa0364” 此代码应从网站中提取价格，但是网站信息中价格中的空白区域会导致错误。我不确定代码是否有效，但它没有进一步研究其他功能。

这实际上是价格：1364，但它给出：1\xa0364'

请看代码：

URL = 'https://www.reebok.se/zig-kinetica-ii-edge-gore-tex/H05172.html'
headers={"user-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0'}
def check_price():
    page = requests.get(URL , headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    
    title = soup.find( class_  = 'gl-heading gl-heading--regular gl-heading--italic name___1EbZs').get_text()
    print(title)
    price=soup.find( class_ ='gl-price-item gl-price-item--sale notranslate').get_text()
    converted_price= float(price[0:5])

python-3.x 网页抓取精度

这是一个有趣而重要的点。我检查了一个新产品，对于三位数的价格（数字），代码有效，代码通过浮点数。仅在四位数字上，网站在第一位数字和其他三位数字之间使用空格。对于四位数字，空白区域始终是这三个字母：例如 1052 是 1\xa052 如果我理解正确，您的解决方案应该是有效的，如果它始终是固定价格，现在如果它更改为任何其他值，它直到不起作用。你的代码给了我 1 364，这仍然不是 1364

0赞 Parazok 4/2/2022

感谢您的解释;我更新了我的答案并修复了多余的空白。但我认为@Cireo答案比我的要好得多。一探究竟。

0赞 Python_bug 4/2/2022

干杯伙计，他的回答解决了问题。

1赞 Cireo 4/2/2022 #2

如果你只想删除空格，你可以用如下方式来做到这一点

split + join

>>> ''.join("1\xa0364".split())
'1364'

regex replace

>>> import re
>>> re.sub("\s", "", "1\xa0364")
'1364'

您可能还会发现这个答案很有帮助，它基本上从字符串中提取数字和小数点，并忽略其他所有内容： Python 删除美元金额中的逗号不过，有时可能会给出一些误报，例如

>>> other_option("Error: 404 file not found.  Try again in 10 seconds")
404.10

这就是我的实现方式，它现在工作： price=soup.find（ class ='gl-price-item gl-price-item--sale notranslate'）.get_text（） import re price=re.sub（“\s”， “”， “1\xa0364”） converted_price = float.fromhex（price[0：5]）_ 我只是想知道如果价格更改为没有空白空间的值或不是 1364 会发生什么，如果它不是 1364，那么我应该在线查看价格并更新值？我的代码应该实时提取数据，因此如果是 1 300，它将引发错误

0赞 Cireo 4/2/2022

这真的是一个十六进制数吗？这似乎令人震惊。该解决方案适用于任意数量的空格（包括或、尾随、前导等）re.sub0100

0赞 Python_bug 4/2/2022

它不是一个十六进制数。它是一个四位数的数字，有一个空格而不是逗号或点。这个数字是1364。它在网站上写成 1 364 他们可能会写 1.364 或简单地写 1364。

1赞 Cireo 4/2/2022

float.fromhex('10') == 16.0. 将截断任何 ints >= 100000，或任何 >= 100.00。你的逻辑非常混乱..price[0:5]

1赞 Cireo 4/2/2022

祝您学习愉快！也请做，然后str_price = soup.find(...)price = float(re.sub("\s", "", str_price))

1赞 QHarr 4/2/2022 #3

您还可以使用正则表达式从脚本标签中提取已经格式化的内容，以便使用“.”

import requests, re

URL = 'https://www.reebok.se/zig-kinetica-ii-edge-gore-tex/H05172.html'
HEADERS ={"user-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0'}

def check_price():
    page = requests.get(URL , headers=HEADERS)  
    name, price = [re.search(f'(?<!Brand",)"{i}":"?(.*?)[",]', page.text).group(1) for i in ['name', 'price']]
    print(f'{name}: {float(price)}')
    
check_price()

如何从我从网络抓取中获得的数字中删除浮点数中的空白空间？错误：无法将字符串转换为浮点数：“1\xa0364”

How can I remove empty space in float from a number I get from web scraping? Error: could not convert string to float: '1\xa0364'

评论

评论

评论

评论