为什么我的代码会给我一个 AttributeError？-解网

问：

我正在尝试遍历几个级别的 html 以检索与立法相关的链接。但是，一旦我到达链接的第 2 级，而不是检索与单个账单关联的链接列表，我就收到错误：

发生异常：AttributeError “NoneType”对象没有属性“startswith” 文件“C：\Users\Justin\Desktop\ilgascrapetest1.py”，第 14 行，在如果 href.startswith（'/legislation/BillStatus.asp？'）： ^^^^^^^^^^^^^^^ AttributeError：“NoneType”对象没有属性“startswith”

这是到目前为止的代码：

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section
house_bills = soup.find('a', {"name": "h_bills"}).parent

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if href.startswith('/legislation/BillStatus.asp?'):
        bill_url = url + href
        bill_response = requests.get(bill_url)
        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

我能够从第一页 Html 中的“众议院账单”表中检索链接列表并遍历该链接列表，但在给出单个账单链接列表的下一级别中，我收到错误而不是从 HB0001 到 HB4042 的账单链接。为什么我会收到此错误

html python-3.x beautifulsoup html解析

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section
house_bills = soup.find('a', {"name": "h_bills"}).parent

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if not href:
        continue  # Ignore links without href
    if href.startswith('/legislation/BillStatus.asp?'):
        bill_url = url + href
        bill_response = requests.get(bill_url)
        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

此外，您混淆了网址：首先，您需要打开“grplist.asp”，然后链接以“BillStatus.asp”开头。要仅访问房屋账单部分中的链接，您需要选择带有名称的链接，而不是其父项。我还更改了您的代码，因此不再从包含“/default.asp”的完整 url 构建。divah_billsbill_url

import requests
from bs4 import BeautifulSoup

url = 'https://www.ilga.gov/legislation/default.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find the House Bills section (next div after a with name "h_bills")
house_bills = soup.find('a', {"name": "h_bills"}).find_next_sibling("div")

# Iterate through all links in the House Bills section
for link in house_bills.find_all('a'):
    href = link.get('href')
    if not href:
        continue  # Ignore links without href

    if href.startswith('grplist.asp?'):
        bill_url = "https://www.ilga.gov/legislation/" + href

        bill_response = requests.get(bill_url)
        if bill_response.status_code != 200:  # Prevent crash when response is not valid
            continue

        bill_soup = BeautifulSoup(bill_response.content, 'html.parser')

        # Find the table cell with width
        td = bill_soup.find('td', {'width': '100%'})
        
        # Iterate through all the <li> elements in table
        for li in td.find_all('li'):
            print(li.text)

为什么我的代码会给我一个 AttributeError？

Why is my code giving me an AttributeError?

评论

评论