文档为空 ( lxml.etree.ParserError: 文档为空 )

Document is empty ( lxml.etree.ParserError: Document is empty )

提问人:Shehan Jayalath 提问时间:10/8/2021 更新时间:4/5/2022 访问量:2952

问:

此错误的原因可能是什么?

我认为这是由于相关网页的页面加载不完整。这是对的吗?

Traceback (most recent call last):
  File "/home/ubuntu/.local/share/virtualenvs/Project-RDkr7CyY/lib/python3.7/site-packages/pyquery/pyquery.py", line 57, in fromstring
    result = getattr(etree, meth)(context)
  File "src/lxml/etree.pyx", line 3213, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1



Traceback (most recent call last):

  File "/home/ubuntu/services/Project/src/parser.py", line 9, in __init__
    self._parser = HTML(html=text)
  File "/home/ubuntu/.local/share/virtualenvs/projects-RDkr7CyY/lib/python3.7/site-packages/requests_html.py", line 421, in __init__
    element=PyQuery(html)('html') or PyQuery(f'<html>{html}</html>')('html'),
  File "/home/ubuntu/.local/share/virtualenvs/projects-RDkr7CyY/lib/python3.7/site-packages/pyquery/pyquery.py", line 217, in __init__
    elements = fromstring(context, self.parser)
  File "/home/ubuntu/.local/share/virtualenvs/projects-RDkr7CyY/lib/python3.7/site-packages/pyquery/pyquery.py", line 61, in fromstring
    result = getattr(lxml.html, meth)(context)
  File "/home/ubuntu/.local/share/virtualenvs/projects-RDkr7CyY/lib/python3.7/site-packages/lxml/html/__init__.py", line 876, in fromstring
    doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
  File "/home/ubuntu/.local/share/virtualenvs/projects-RDkr7CyY/lib/python3.7/site-packages/lxml/html/__init__.py", line 765, in document_fromstring
    "Document is empty")
lxml.etree.ParserError: Document is empty
xml html 解析

评论


答:

0赞 fmalina 4/5/2022 #1

对我来说,这是由于前导或尾随空格造成的,但我没有设法重现。 修复了“文档为空”错误:str.strip()

html = html.strip()
dom = fromstring(html)