如何使用 scapy 从请求中获取网络流量中 api 调用的标头?

How do I get headers of an api call in the network traffic from a request using scapy?

提问人:Mohit Aswani 提问时间:11/2/2023 更新时间:11/2/2023 访问量:26

问:

我想抓取推特的文章。以下面的 URL 为例。https://twitter.com/UNTechEnvoy/status/1704972265866014829

在请求上述 URL 时,我们发现以下 API 调用在网络流量中具有特定标头,用于获取文章数据。

https://api.twitter.com/graphql/5GOHgZe-8U2j5sVHQzEm9A/TweetResultByRestId?variables=%7B%22tweetId%22%3A%221704972265866014829%22%2C%22withCommunity%22%3Afalse%2C%22includePromotedContent%22%3Afalse%2C%22withVoice%22%3Afalse%7D&features=%7B%22creator_subscriptions_tweet_preview_api_enabled%22%3Atrue%2C%22c9s_tweet_anatomy_moderator_badge_enabled%22%3Atrue%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22responsive_web_twitter_article_tweet_consumption_enabled%22%3Afalse%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22responsive_web_home_pinned_timelines_enabled%22%3Atrue%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Atrue%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Atrue%2C%22longform_notetweets_rich_text_read_enabled%22%3Atrue%2C%22longform_notetweets_inline_media_enabled%22%3Atrue%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Atrue%2C%22responsive_web_media_download_video_enabled%22%3Afalse%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%7D

headers = {
  'authority': 'api.twitter.com',
  'authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA',
  'content-type': 'application/json',
  'cookie': 'guest_id_marketing=v1%3A169883004211703651; guest_id_ads=v1%3A169883004211703651; personalization_id="v1_z3S9HEXBgiQBLPn9TMbSLA=="; guest_id=v1%3A169883006823417906; gt=1719644188290040005; guest_id=v1%3A169865337165479828; guest_id_ads=v1%3A169865337165479828; guest_id_marketing=v1%3A169865337165479828; personalization_id="v1_PoXKYFsBsEAzLKCo41vjqw=="',
  'origin': 'https://twitter.com',
  'referer': 'https://twitter.com/',
  'sec-ch-ua': '"Chromium";v="118", "Brave";v="118", "Not=A?Brand";v="99"',
  'sec-ch-ua-platform': '"Windows"',
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
  'x-client-transaction-id': 'H4Tcw9J6LDFN6U2WzYR4exzeOdxZ4+gpEzzZwMqFERoUjGB+92eN6XgJdb9vwzLr9r2s7R+mX1T/a9ExhV4HL7rb/TGGHg',
  'x-guest-token': '1719644188277379350',
  'x-twitter-active-user': 'yes',
  'x-twitter-client-language': 'en-US'
}

请注意,来宾令牌每 1-2 小时过期一次,因此用户需要刷新标题才能在脚本中用于抓取 Twitter 文章。

关于这一点,我找到了一种使用 scapy 库检索“api.twitter..' url 标头的方法,但是我无法获取它。

我在网上搜索并尝试了下面的部分代码。

import requests, threading
from scapy.all import sniff
from scapy.layers.http import HTTPRequest

def sniff_traffic():
    sniff(filter="tcp and (port 80 or port 443)", prn=process_packet)

def process_packet(packet):
    if HTTPRequest in packet:
        host = packet[HTTPRequest].Host
        path = packet[HTTPRequest].Path
        headers = packet[HTTPRequest].fields

def run(url):
    t = threading.Thread(target=sniff_traffic)
    t.start()
    response = requests.get(url)
    t.join()

run('https://twitter.com/UNTechEnvoy/status/1704972265866014829')

您能协助我获取在网络流量中调用的 API URL 的标头吗?即使除了“mitmproxy”之外还存在其他方法,也要共享。先谢谢你们。

python selenium-webdriver scapy packet-sniffers 嗅探

评论


答: 暂无答案