提问人:Yasser Mohamed 提问时间:8/9/2023 最后编辑:Yasser Mohamed 更新时间:8/11/2023 访问量:181
在 XML parser.feed(text) xml.etree.ElementTree.ParseError 中:格式不正确(标记无效):第 1 行,第 0 列
in XML parser.feed(text) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
问:
这段代码我在 Ping AI 中测试过并有效,但在我的 Vstudio 中不起作用
import urllib.request
import urllib.parse
import urllib.error
import xml.etree.ElementTree as ET
import ssl
api_key = False
if api_key is False:
api_key = 42
service_url = 'http://py4e-data.dr-chuck.net/xml?'
else:
service_url = 'https://maps.googleapis.com/maps/api/geocode/xml?'
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
address = 'http://py4e-data.dr-chuck.net/comments_42.xml'
parm = dict()
parm['address'] = address
if api_key is not False:
parm['key'] = api_key
url = service_url+urllib.parse.urlencode(parm)
print('retrieve ', url)
uh = urllib.request.urlopen(url, context=ctx).read()
print('retrieved ', len(uh), 'characters')
datas = uh.decode()
tree = ET.fromstring(datas)
suum = 0
count = 0
counts = tree.findall('.//count')
for i in counts:
suum +=int(i.text)
count +=1
print(suum)
print(count)
我的输出应该是。
Retrieving http://py4e-data.dr-chuck.net/comments_42.xml
Retrieved 4189 characters
Count: 50
Sum: 2...
但我的输出是。
retrieve http://py4e-data.dr-chuck.net/xml?address=http%3A%2F%2Fpy4e-data.dr-chuck.net%2Fcomments_42.xml&key=42
retrieved 36 characters
Traceback (most recent call last):
File "e:\ياسر\python_file\project_etree.py", line 28, in <module>
tree = ET.fromstring(datas)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\beaut\AppData\Local\Programs\Python\Python311\Lib\xml\etree\ElementTree.py", line 1338, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0
我不知道为什么我会得到.
并且应该使用 XML 解析,因为它是一个测验,并尝试了解为什么对我不起作用问题在哪里。retrieved 36 characters
答:
0赞
Andrej Kesely
8/10/2023
#1
您可以尝试使用beautifulsoup,它在下面使用:lxml
import requests
from bs4 import BeautifulSoup
url = "http://py4e-data.dr-chuck.net/comments_42.xml"
soup = BeautifulSoup(requests.get(url).content, "xml")
comments = soup.select("comment")
count, s = 0, 0
for c in soup.select("comment"):
count += 1
s += sum(int(c.text) for c in c.select("count"))
print(count)
print(s)
指纹:
50
2553
评论
0赞
Yasser Mohamed
8/10/2023
我应该使用XML解析,因为它是一个测验
0赞
Hermann12
8/10/2023
#2
Pandas 将显示 dataframe:
import requests
import pandas as pd
import numpy as np
url=r"https://py4e-data.dr-chuck.net/comments_42.xml"
request = requests.get(url)
print(f"Retrieving: {url}")
print(f"Retrieved {len(request.text)} characters")
df = pd.read_xml(request.text, xpath='//comment')
#print(df)
print("Count:", df.shape[0])
sums = df.select_dtypes(np.number).sum().rename('total')
print("Sum:", sums)
输出:
Retrieving: https://py4e-data.dr-chuck.net/comments_42.xml
Retrieved 4189 characters
Count: 50
Sum: count 2553
Name: total, dtype: int64
对于 xml.etree.ElementTree,请使用 Session:
import requests
import xml.etree.ElementTree as ET
s = requests.Session()
url=r"https://py4e-data.dr-chuck.net/comments_42.xml"
r = s.get(url)
print(r.status_code)
print(type(r.text))
tree = ET.fromstring(r.text)
print(tree)
for elem in tree.iter():
# do your things
print(elem.tag)
或者,如果您需要 urllib,它也可以工作:
import urllib.request
import xml.etree.ElementTree as ET
url="https://py4e-data.dr-chuck.net/comments_42.xml"
with urllib.request.urlopen(url) as f:
xml = f.read()
# xml is a byte string
# print(xml)
root = ET.fromstring(xml)
for elem in root.iter():
# do what you like with the xml content
print(elem.text)
-1赞
jdweng
8/10/2023
#3
可以使用 Powershell
using assembly System.Xml.Linq
$uri = 'https://py4e-data.dr-chuck.net/comments_42.xml'
$doc = [System.Xml.Linq.XDocument]::Load($uri)
$comments = $doc.Descendants('comment')
$groups = [System.Linq.Enumerable]::GroupBy($comments, [Func[object,object]]{ param($x) $x[0].Element('name').Value}, [Func[object,object]]{ param($y) $y[0].Element('count').Value})
$dict = [System.Linq.Enumerable]::ToDictionary($groups, [Func[object,object]]{ param($x) $x.Key}, [Func[object,object]]{ param($y) $y})
$dict
结果
Key Value
--- -----
Romina 97
Laurie 97
Bayli 90
Siyona 90
Taisha 88
Alanda 87
Ameelia 87
Prasheeta 80
Asif 79
Risa 79
Zi 78
Danyil 76
Ediomi 76
Barry 72
Lance 72
Hattie 66
Mathu 66
Bowie 65
Samara 65
Uchenna 64
Shauni 61
Georgia 61
Rivan 59
Kenan 58
Hassan 57
Isma 57
Samanthalee 54
Alexa 51
Caine 49
Grady 47
Anne 40
Rihan 38
Alexei 37
Indie 36
Rhuairidh 36
Annoushka 32
Kenzi 25
Shahd 24
Irvine 22
Carys 21
Skye 19
Atiya 18
Rohan 18
Nuala 14
Maram 12
Carlo 12
Japleen 9
Breeanna 7
Zaaine 3
Inika 2
评论