C# Unity - 如何解析 Steam 新闻页面以提取 IMG URL?从 Get Resquest 收到的数据似乎不完整。是XHTML吗?

C# Unity - How To Parse a Steam News Page to Extract an IMG URL? The Data received from the Get Resquest seems incomplete. Is it XHTML?

提问人:Katerlad 提问时间:9/8/2022 最后编辑:Katerlad 更新时间:9/9/2022 访问量:155

问:

问题:

我在尝试解析此页面以查找包含图像链接的外壳 xml 标记时遇到问题。

https://store.steampowered.com/feeds/news/app/1348750/?cc=US&l=english&snr=1_2108_9__2107

我使用的是 Unity 和 C#,我以编码的 UTF-8 字符串或从 unitys UnityWebRequest.Get 方法的原始 byte[] 中获取数据。

当保存到文件或打印到文本框时,似乎很多数据都是混乱的。

我相信数据应该是 XHTML,我不确定如何将其转换为可读格式以开始解析 URL。

期望:

<item>
      <title>Update v1.3</title>
      <description>Changelog:&lt;br&gt;&lt;ul class=&quot;bb_ul&quot;&gt;&lt;li&gt;Arry and Miri can now gain affection.&lt;br&gt;&lt;/li&gt;&lt;li&gt;The player can now view Arry and Miri's epilogue scenes. (You will have to answer their proposal one more time.)&lt;br&gt;&lt;/li&gt;&lt;li&gt;The Arry and Miri achievements have been fixed and should unlock when you reach the epilogue.&lt;br&gt;&lt;/li&gt;&lt;li&gt;The ability management backend has been 100% REDONE, and all associated bugs should be fixed.&lt;br&gt;&lt;/li&gt;&lt;li&gt;Specifically, ability use counts and levels will no longer reset if they're no longer in a player preset.&lt;/li&gt;&lt;/ul&gt;&lt;br&gt;&lt;div class=&quot;bb_h2&quot;&gt;ACTION REQUIRED:&lt;/div&gt;If you take one thing from this update, it's that &lt;b&gt;Arry and Miri's&lt;/b&gt; conversation lines were &lt;b&gt;broken&lt;/b&gt; before this update, with all their epilogue scenes being &lt;b&gt;hidden&lt;/b&gt;. &lt;br&gt;&lt;br&gt;&lt;b&gt;If you've been stuck at their proposal, doing it one more time should unlock their achievements and lead you to their epilogue scenes.&lt;/b&gt;&lt;br&gt;&lt;br&gt;Additionally, I've begun work on the logbook! It's coming along nicely.&lt;br&gt;&lt;br&gt;&lt;img src=&quot;https://i.imgur.com/FBdDow9.png&quot; /&gt;</description>
      <link><![CDATA[https://store.steampowered.com/news/app/1348750/view/3178989079011859448]]></link>
      <pubDate>Tue, 10 May 2022 02:28:06 +0000</pubDate>
      <author>niku_treat</author>
      <guid isPermaLink="true">https://store.steampowered.com/news/app/1348750/view/3178989079011859448</guid>
      <enclosure url="https://cdn.akamai.steamstatic.com/steamcommunity/public/images/clans/38198503/25733639d563297f7a04c4fd68537f5b9aba3d67.png" length="0" type="image/png" />
    </item>

实际:

这些只是数据中的片段,顶部是它能够显示的一些 HTML,底部看起来像 XML 部分或 XHTML?

<?xml version="1.0"?>
<string>&lt;!DOCTYPE html&gt;
&lt;html class=" responsive" lang="en"&gt;
&lt;head&gt;
    &lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8"&gt;
            &lt;meta name="viewport" content="width=device-width,initial-scale=1"&gt;
        &lt;meta name="theme-color" content="#171a21"&gt;
        &lt;title&gt;Hearts of the Dungeon List - Steam News Hub&lt;/title&gt;
    &lt;link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"&gt;
ot;:&amp;quot;Changelog:\n[list]\n[*]Arry and Miri can now gain affection.\n[*]The player can now view Arry and Miri's epilogue scenes. (You will have to answer their proposal one more time.)\n[*]The Arry and Miri achievements have been fixed and should unlock when you reach the epilogue.\n[*]The ability management backend has been 100% REDONE, and all associated bugs should be fixed.\n[*]Specifically, ability use counts and levels will no longer reset if they're no longer in a player preset.\n[\/list]\n\n[h2]ACTION REQUIRED:[\/h2]\nIf you take one thing from this update, it's that [b]Arry and Miri's[\/b] conversation lines were [b]broken[\/b] before this update, with all their epilogue scenes being [b]hidden[\/b]. \n\n[b]If you've been stuck at their proposal, doing it one more time should unlock their achievements and lead you to their epilogue scenes.[\/b]\n\nAdditionally, I've begun work on the logbook! It's coming along nicely.\n\n[img]https:\/\/i.imgur.com\/FBdDow9.png[\/img]&amp;quot;,&amp;quot;commentcount&amp;quot;:1,&amp;quot;tags&amp;quot;:[&amp;quot;mod_reviewed&amp;quot;,&amp;quot;ModAct_870845553_1652150701_0&amp;quot;],&amp;quot;language&amp;quot;:0,&amp;quot;hidden&amp;quot;:0,&amp;quot;forum_topic_id&amp;quot;:&amp;quot;3274690571081137373&amp;quot;,&amp;quot;event_gid&amp;quot;:&amp;quot;3178989079011859448&amp;quot;,&amp;quot;voteupcount&amp;quot;:11,&amp;quot;votedowncount&amp;quot;:0,&amp;quot;ban_check_result&amp;quot;:0},&amp;quot;published&amp;quot;:1,&amp;quot;hidden&amp;quot;:0,&amp;quot;rtime32_visibility_start&amp;quot;:0,&amp;quot;rtime32_visibility_end&amp;quot;:0,&amp;quot;broadcaster_accountid&amp;quot;:0,&amp;quot;follower_count&amp;quot;:0,&amp;quot;ignore_count&amp;quot;:0,&amp;quot;forum_topic_id&amp;quot;:&amp;quot;3274690571081137373&amp;quot;,&amp;quot;rtime32_last_modified&amp;quot;:1653189104,&amp;quot;news_post_gid&amp;quot;:&amp;quot;0&amp;quot;,&amp;quot;rtime_mod_reviewed&amp;quot;:1652150697,&amp;quot;featured_app_tagid&amp;quot;:0,&amp;quot;referenced_appids&amp;quot;:[],&amp;quot;build_id&amp;quot;:0,&amp;quot;build_branch&amp;quot

示例代码

public static IEnumerator GetCoroutine(string url, Action<string> OnError, Action<string> OnSuccess)
    {
        using (UnityWebRequest result = UnityWebRequest.Get(url))
        {
            yield return result.SendWebRequest();

            if (result.result == UnityWebRequest.Result.ConnectionError || result.result == UnityWebRequest.Result.ProtocolError)
            {
                OnError(result.error);
            }
            else
            {
                string rawUTF8Text = result.downloadHandler.text;

                FileStream file;
                string filePath = Application.persistentDataPath + "/xmlsave.xml";
                file = File.Create(filePath);
                
                BinaryFormatter formatter = new BinaryFormatter();
                formatter.Serialize(file, rawUTF8Text);

                file.Close();

                OnSuccess(rawUTF8Text);
            }
        }
    }
C# XML 解析 XHTML HTML 解析

评论

0赞 Neil 9/8/2022
链接到的页面是 XML RSS Atom 源。无需手动 grep XML,只需使用 XDocument 或 XmlReader 或其他一些 XML 解析器即可。
0赞 Katerlad 9/9/2022
我添加了一个代码示例,说明如何将返回 GET 正文打印到文件中?那么我必须将 UTF8 文本转换为 Xdocument 吗?然后解析我想要的信息?
0赞 jdweng 9/9/2022
使用 XDocument doc = xDocument.Parse(string);

答: 暂无答案