提问人:Chweng Mega 提问时间:6/15/2017 最后编辑:Chweng Mega 更新时间:6/15/2017 访问量:1478
Python HTMLParser:AttributeError
Python HTMLParser: AttributeError
问:
我正在使用 HTMLParser (python 2.7) 来解析我使用 urllib2 下拉的页面,当我想将数据存储到 feed 方法中的列表中时,我遇到了 AttributeError 异常。但是,如果注释掉__init__方法,则异常消失了
main.py
# -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
import urllib2
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
class MyHTMLParser(HTMLParser):
def __init__(self):
self.terms = []
self.definitions = []
def handle_starttag(self, tag, attrs):
# retrive the terms
if tag == 'div':
for attribute, value in attrs:
if value == 'word':
self.terms.append(attrs[1][1])
# retrive the definitions
if value == 'desc':
if attrs[1][1]:
self.definitions.append(attrs[1][1])
else:
self.definitions.append(None)
parser = MyHTMLParser()
# open page and retrive source page
response = urllib2.urlopen('http://localhost/')
html = response.read().decode('utf-8')
response.close()
# extract the terms and definitions
parser.feed(html)
输出
Traceback (most recent call last):
File "/Users/megachweng/Project/Anki-Youdao/combined.py", line 35, in <module>
parser.feed(html)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 116, in feed
self.rawdata = self.rawdata + data
AttributeError: MyHTMLParser instance has no attribute 'rawdata'
答:
1赞
Andrea
6/15/2017
#1
我认为您没有正确初始化 HTMLParser。也许您根本不需要初始化它。这对我有用:
# -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
import urllib2
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Encountered a start tag:", tag
# retrive the terms
if tag == 'div':
for attribute, value in attrs:
if value == 'word':
self.terms.append(attrs[1][1])
# retrive the definitions
if value == 'desc':
if attrs[1][1]:
self.definitions.append(attrs[1][1])
else:
self.definitions.append(None)
parser = MyHTMLParser()
# open page and retrive source page
response = urllib2.urlopen('http://localhost/')
html = response.read().decode('utf-8')
response.close()
# extract the terms and definitions
parser.feed(html)
更新
# -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
import urllib2
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.terms = []
self.definitions = []
def handle_starttag(self, tag, attrs):
# retrive the terms
for attribute in attrs:
if attribute[0] == 'align':
self.terms.append(attribute[1])
self.definitions.append(attribute[1])
parser = MyHTMLParser()
html = "<table align='center'><tr><td align='left'><p>ciao</p></td></tr></table>"
# extract the terms and definitions
parser.feed(html)
print parser.terms
print parser.definitions
输出:
['居中', '左']
['居中', '左']
评论
0赞
Chweng Mega
6/15/2017
是的,它绝对有效,但是任何如何将 attrs[1][1] 存储到列表中以便我可以访问它的想法(打印 parser.terms --MyHTMLParser 实例没有属性“terms”)
0赞
Chweng Mega
6/15/2017
#2
好的,我得到了解决方案,无法工作,必须对名称进行硬编码super().__init__
def __init__(self):
HTMLParser.__init__(self)
评论