httpWebrequest 下载垃圾

httpWebrequest downloads garbage

提问人:someotherguy 提问时间:10/13/2023 更新时间:10/13/2023 访问量:43

问:

我正在尝试使用 vb 中的 HttpWebRequest 从 sec.gov edgar 网站下载信息。 使用 Mozilla Firefox,我激活了 HTTP Header Live 扩展,然后导航到包含所需信息的示例页面。然后,我一次测试一个生成的标题,直到找到在将它发送到网站时为我提供所需信息的标题。在这里:

    GET: https://www.sec.gov/cgi-bin/browse-edgar?company=Elevance+Health&match=starts-   with&filenum=&State=&Country=&SIC=&myowner=exclude&action=getcompany

    Host: www.sec.gov

    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/118.0

    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8

    Accept-Language: en-US,en;q=0.5

    Accept-Encoding: gzip, deflate, br

    Referer: https://www.sec.gov/edgar/searchedgar/companysearch

    Connection: keep-alive

    Cookie: _ga_300V1CHKH1=GS1.1.1696190499.2.0.1696190499.0.0.0; _ga=GA1.2.1228185708.1696111268; _gid=GA1.2.1029964278.1696111298; _4c_=%7B%22_4c_s_%22%3A%22lZHNboMwEIRfJdpzjGwDtuFWpWrVQ1u16s8xAtsEK1GMgIa0Ee%2FeNURNjy0X1p93RuvZEwy13UPORCZYRtOUMREvYWs%2FO8hP0DoTfgfIwSRaV9RQUvBSk6S0CSlMWRBpC6ZYLOOqSGAJR%2FSSNFZKciEll%2BMSdHP2OIH2xqIXyyKmIkqqDhX9FxLCE4p103rzoft1%2F9mEvsGWi85s8cLYg9N2PTjT15MBpxdaW7ep%2B4A5nXDThgNWg9sbP1xkgssL%2FZFlIkZatn7obFDeuNZW%2FrhgLHR7TALeJ0kYF69s2059eOpcHwbtrI42%2FnAGmN7MyMzeXKBm8bJ6Rv7wizytHu%2FPqAkhp1jsvC52wRT3soTbq%2FXr3fX0Nq6YSiVV0bQsxrhQMM6JI%2BBSqCyTkmKi%2FQ5yJRIavnG2nhaQ%2FqV7zoGE8e3%2BP9Jx%2FAY%3D%22%7D; ak_bmsc=E207DEF9828569099367BF9FFE88E631~000000000000000000000000000000~YAAQPu8uF8eAscuKAQAAeRDV7BXngdLEy8GovoCQxRcwHda6So3C7OOGm4brY4YktF38AfUj9OxxWzN+yX6W7fZtKwjXockPlZJOWVWA7Pby6KxXDafPEK9+vUau+sMcGHHrMz3Zm7xNfPHNqnmVYOVPtez7skG0HofR1dvVIXUAv0nsZtZ1mRJC/4coDSVY5CHZm+H4mmSJIGuQeFwQfnkjjGLCzRK4loqIYnoplhilVVsEVaB63cR93txbh1aeZ/wc4QX9X+zO3xxXynno9JqznFIDxYcPbFX5tu4em4hyZbubsUUNpaip3wSmOpk2k17yra1P310pBIzinFVbQuFARXHjNN6Q0C0bS1kv8S9cKwyFut9/pFa5bneBOWE6Ma+zl+YwkSzjvktCR3/I3VTT2jLCwSjAO6s74YbEy43Vshu6bXmH9fz/zPSbBDUjRK5EI3mLfgPOigliaDCSw54DxlqrCsNq/aIQwsZR0LS/Fv2ZCEhMVyCBnzUTuwozy8UeeC/hnQNsozuXlHNpLsIOi98TKgzrnNk6dxcDfDVYTUs1RzODfuvVctg=; bm_mi=AD441E8F4B657B1B896684E18B33880C~YAAQPu8uF6mAscuKAQAAzQrV7BWY2Gn+RGjgejVQKZMrP/Bjj5uSK3xAx/3LI2MpmNmNEsDp7nVJ40PNfpV5k9fzdPR4cDsckhMxp2WOqzz8o1nsTNOvJglfxb5XVGMUYt47xSQZiwSgw4IR//xcNIw1a8CcRQOGDa4vn4vNE0UQDBIhilPTN+Yw7qppH0P46ZP47cVEbd797VIubiSPS7uH2elsRhBojGZvETUJTCSIovjSA+R1l7K9YWwXMeHUV2SUNkR49WatJZ5mkULdcSmQ9mKb3Y5nt+hmoBodic984+lOWxoqoU5GkaJRO2dYjnAv0ZwiFe1XoTiZnD5oxPUxDvmSvrU=~1; bm_sv=24020AB96471CE67DC70E0A5BE862C68~YAAQPu8uF6qAscuKAQAAzQrV7BVMIu/drYxXiDXHq9zWrVtZyRqFRT5+4+KecShEwcBjuihOE4ueUrlZU/m8+QyWnuyJbH//z7wTKkhbbPEx6KP1A3CCKCz+DlJvPF5qNd9yWOZO9p0o85NlRU38OzHj1EUrA9MPl5XAyvBeQ0wKt7MMhR0n1RMGm+1Nczbn1+ONtlcTqv/0BXcYnu14mUN0/k0KGLL3hvIoudOGWz7AALAkhfOfj+nNXRAs~1; _gat_UA-30394047-1=1; _gat_GSA_ENOR0=1; _gat_GSA_ENOR1=1

    Upgrade-Insecure-Requests: 1

    Sec-Fetch-Dest: document

    Sec-Fetch-Mode: navigate

    Sec-Fetch-Site: same-origin

    Sec-Fetch-User: ?1

然后,我编写以下代码:

    Imports System.Net

    Imports System.IO

    Imports System.Text

    Dim postReq As HttpWebRequest = Nothing

    Dim postresponse As HttpWebResponse = Nothing

    Dim tempCookies As New CookieContainer

    ServicePointManager.SecurityProtocol = CType(768, SecurityProtocolType) Or CType(3072, SecurityProtocolType)

    postReq = DirectCast(System.Net.WebRequest.Create("https://www.sec.gov/cgi-bin/browse-edgar?company=Elevance+Health&match=starts-with&filenum=&State=&Country=&SIC=&myowner=exclude&action=getcompany"), System.Net.HttpWebRequest)

    postReq.Method = WebRequestMethods.Http.Get

    postReq.Host = "www.sec.gov"

    postReq.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/118.0"

    postReq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,text/css,text/javascript,*/*;q=0.8"

    postReq.Headers.Item("Accept-Language") = "en-US,en;q=0.5"

    postReq.Headers.Item("Accept-Encoding") = "gzip, deflate, br"

    postReq.Referer = "https://www.sec.gov/edgar/searchedgar/companysearch"

    postReq.Connection = "keep(-alive)"

    postReq.Headers.Set("Cookie", "ak_bmsc=E095D8D23EF310F6F95709E37A739307~000000000000000000000000000000~YAAQEqosF/9GP9h7AQAAt/F1CQ3hDOoLRTYowkbgk8jx96g7/ZNfK3JVttsRRtAzND7g35DDavqTE2bzNqB1PkpKlCSbnppwndY7dp1sjvpVvXEAYiphlRjA8mKouXEP6EfsjVVRZamc2awyclysYmL5bz/ENo3ChyTZQsR68cjnZZ8Ggq/RQ1wt4LoLSgXpCBakgIo7xnzyKqCD+Q26AnUeFxpGlfUaRkcp7zcC07BogYNzMXUJWp43c9oZNkvhaerKQen02DpGCgOnMWIFaBzIRn1OeZJVZKW2cQAZT5dDQxnIhufBc27mJ8V0lEsqaWdGUG4mrH19+PYBh7xkKAN1xdXWuI13tvhaOUEIbLXL5FxQi2LatqQVk3oOAXcoAgL2KsxQWg==; _ga=GA1.2.2062415035.1632246298; _gid=GA1.2.1645803808.1632246298; _4c_=fVJbbqQwELxK5O8x%2BMXDc4C9wn5GYDeDFQLIODiTaO6e9sCElUZaPix3uarsruabxB5GcualFEKVstBcyRN5g%2BtCzt%2FEzGld0%2FLhB3ImfQjzcs7zGGO2gMku05qDvTQ%2BX6Dxpt%2F2Znqfm%2FG6QVkf3gdyImaygA5cZ5xnHIHwhSUViuF%2B9pP9MOE1XOdEitC%2BLPYNDyyszsBrdDb0d7VgB9qDu%2FQhwYLd4dmnAnfRjXaKh6xi6kAPGZMS4dZPcYEk%2FeM8dNPnixYITxgB%2BXuXLFjiCXh%2Fp%2F2bQuvGS4YN50hZXEiv34PZAcxyw%2BiGzRgnSf7DZJoh8XEAtxP53KYgpahYwZnCgAJGXpeKpQ8Z3tl9HARY3TKoGlo0naXKtILqzhjKWWGl0I0otCW7J%2BqFELoqmEKT1T08CgEd07ajjeGMKltqWrOKUVnLsgYLopW%2FHunvqNGcab6%2Fi9ePZ83D7sgPcslKvLCWD7L6bWJed7Z4alk%2Bt7yNhqb0YPyP9Cmt2%2B0H; bm_sv=956CC891F583A3B2E42B6B0A73EDAE69~F8u8D/WEP9Z13rYWm/odEOWtLBL4e6mDQt6SHCYmpu3DiYSW4fPZe5KOdH8Mn8zzWYOvz/Lw9v1sNeLJfPTWdOk0NJXYDLh2aCto/1ugU8Kip4IfZoT93HxnLigDkLG51YoIPRqkdvLrU4Uum19z3g==; _gat_UA-30394047-1=1")
        
    postReq.Headers("upgrade-insecure-requests") = 1

    postReq.Headers("Sec-Fetch-Dest") = "document"

    postReq.Headers("Sec-Fetch-Mode") = "navigate"

    postReq.Headers("Sec-Fetch-Site") = "same-origin"

    postReq.Headers("Sec-Fetch-User") = "?1"

    postresponse = DirectCast(postReq.GetResponse(), HttpWebResponse)

    Dim postreqreader As New StreamReader(postresponse.GetResponseStream())

    Dim MyStr As String = postreqreader.ReadToEnd

代码运行成功,但 mystr 显示为:

    �      �]{S�����)z3u�5�M �)�aG�Uf�f����4
I*  "s�~�{�;�e qP���$�~�>�w^�!9��z~��vQC�g�����i�q� |U��ڬ�jV�P30�Ўl�5A�58�u��/
�`0���t��Ѝz�*8��Y\y�!�p;%���@K�a�WdG.ת*��
��E�8�;Qx(�k;���ޢn��%N�]��[�#fd���
�S��h�ఋqġh����H������1!�c�B�3���p��|���e�����#:Ӿс�w�-�P����,�&���b�s�C�%s����5"�43|߱M�HL0"���}ρk��%��<?�f��oٮ�
�A�yJ�{�^�-upd2�~7z��q�SIEI�r�R�Et4zd��-v�
\�by���QI��~���!��nĻr��iX��zέ���oXA��+��Ӕ7�z� �E���a��َw'�FO�Am�-�`pY���1O��-�"*�R��n����<������*�
��6Cģ/@����e:�l��٠6E�F��w�9̂��3=��f�-��y����v`�p�!�E��:��N�pkA�$�!��yo[��3�YU�>}�v���t�'rh`[Q��3���w�a��\|@Hmَ
�]۲�K��Ɔ�9��ʻ�Ka�;ط��}g������D>�����r���aw�?�N�ˆ >S�q� U/�{Y�X����>��nT��n���� �K�Rsp��Gæ�i E���w����
\7}�U�¾唜�J@�qjq�y���)e�p����M���Y��>��՝��2zW3�X�A;����,������ݛ������v�ky�����3�e��  ��3c)e�&Ԛ@��Zi0㉶��G��:�4�K�˾C@$G ����ZB��q5�����0\���Ç�QF����ef�p�S`}� ��   g�k%`*��`�m���W����Ct�&� u�f0����^'V    �=�Į:0�l�nsb�#�"]�� S(����̯P�C`�]�L�e�`ؚĈΌrD���i��״T�����r�74�k�Š����Ex���x�F(��uR�oE��߉�e��U%��#k���ӳ""6��d�$���(�9|�  Ѝ�$\�a|Y�T�/?�]Mc��m�^+���w�79t�tdy� ��;)��rA��v�.�p�:Ɲ�yE]��f���,|<�a��NJ�++�֛+Ǒ@���#�1°��'��1��X�⊇a�)�p{�nӃ�ʍ��]�m�욕�&:�����/�I�)i�=�$��ޛ3E���Ȼ�A����w9@�� "���E��踚;�p�y�Q���+��֠�e8㚤r�����ʲ,�/�K@[��*_j��5��P����;�F�^iT.�O�W0��dQ7��kl\�L��.��sX�W^E�T=�ȩ�P�C|��#�C�;�����G��Qw�]3��<w�KB�c�k`8p�}�CP��'�V�����O[�Ώj��*�
1C렸;�f�q�Y�Y���mЉ��gv�W�F`�Y��(�a��}�y����ĩB��"�K'I*Q{ģ��W�f��*�V���U��K���.N+���V �h�0r<��<�D�I��XJb�G lX�MQ��<�C�LJP��M`�7��$�&+#��8p�ETi4?��@]��n,Jp`�:]$���K��{���3z1jS�2/�xY}��k�����h���K2/K�[��k*Qj%9n�{�n�&eӉ�e�/|�Q����<�&�d�a) n�!8T("���-S �(�CD*�nȥ�ΊP4�2�� "^��S"���L�{@�Y~D��΋��p����~�5! ct���.�.��q��c���=�Cw�Ӈӱ�<�=�i�x�_�#����/��c�]|wB���fn?���q�ʗd�`I+���:�j�8�K�%@;�P5`�&�P4��zu��ŇBd��@�@G�G#�v^S���:Đ����0��fL]R~�|.I��D4�O�M�}ǹ&��q�軝)�y��x<må�%��=̢|�-��8�%��=Ԅ�"��r��l92Z��O8� �D��8k�
1ӈL��l/��<�<�D{�    h����U�{)�е�trdFΨ�G�tNLDص��SC1]��U��d�e�؀���[L�ov0I��M�?iS�F�$�~����x8uz��F9>Xu8�u�l,v4H#���H�z�|&c,�u�\����Q����>N�&p��/�! &Kl1��}����1u9�J3"<�f~���ʒ��d8|T���Wk�
B�zl����ɀ��ڀ��3[�
L�衤�M�N���
��K�cr�7
�����`�R�S�-��ǎm�B��ϳ��fI�,�!������8s�{�宑t���ta�}af��d�����*ӥ~>�\���dZ�(��� R+6�����_�ݔ�%��rC�ı��2�ɨf�F�t��<��4�-���ʦ�����Ă\�*Mb�e!�P��^�$�CT�w���s]��#75�jvI\����:�!O��x��\^R�%�yp�<7�JƋ�S˗Ts ;�OV���:��Xr�x�Ȧ Hi:Pd@iY�B����|g�t��yIN�Ӭ�<"�Y���l-��:����tq
�űV��A0�{�jX�b�4W?��    ��C[��or�8�l�v綑��Ft�)��d�*f�� �`�ٻY�P{���C�~�E2���Ll���B�7�ıo���?1Q�]k}��}����kb�   c��
9sE���/$SL$[8a�efi�I1�;V�����x�G�@����Ҥ����\��7�4�h��V �?љ�$�텃L��hDa}|����  �|!��4x���jYQA�k�BV�⼼b���6��H�UTT1���+Ws���ѧ�E�U](�E�MyQK��@�+�!H1p�,�RNפG,��5)[AYV$�q՜�� �����p�V�o5
��S�=5t��T#Hf�p��!Hi[��m�$��^�*�*!�3���]�%5�M~5Ԩ�R Y���sz�K��]r�4�M56���V�{�c� �l�>�v��[��
��:$)�|I۵�~,�����H2:{,�D貰��G.�d}]�g]� ti�"+yE�'���X�� ~��w�H��Tg    [�ju�J`-n>�s��Ӗ�:�ʚͶX�?4�P�V'Zw�#��'��/{P�2In%���<��Q��ƅn�q���N�.�S��0���t�����
P�����6�Z�%u�|�X�u��bAL�6�v��@h�u��,���H�ZDS�DQՉhd5/O��µ��E��t?�|/�

/��,M�+5�%���+ʺ"=QI��nΧ�'ϟ�a��֓�*���
���K��؉]�G��{I��3�+����
����ギ����V��x[pM��mi�;b^K3��Z�{�u�[#�{��
1���3o�����Ͽ�7���v�v� ���.
���";��V�Q����Ǒ��ye��$�R�|ŧŌ/<��J+`��-�IWY������z��q�9����X����T�c�h+�D�Y���9����}
��"�����=���Q��Ր�p(���oGC�4A(�"��jHZ���wU�#�
��!m� ���!i[2��l���ѐ\!E�"Wx��K��ȭ��gL��4��ri�zn�V�g{�'-+�K�ro��WY��l�&jA��|~�tV�z�=U--�Ӟ�ǽ �5u�T��5�,�n��S^e�7�{UԴ�4��J7�3�m�*#q�����r^��u�w�vR��u��A��RZm�)mŲ�G���8��FZ@�x�C�
ӦO��A�H���T/�F��k��J����R��b�
Sᕍ� �^(,�   �
=��Rq�I��K���󨋃�'����G�9��A<��5̵��JZަl����~����UxYߘ:�D=���Ќn���!lsnژ%l]?[R��^$]Sr��XT��ߔm�S�'ܽ�<��g����XٖmE�s��$�U$�Z*�\*۲ϣ$�<�>�d^Hr�F��-=r��ꈜ��#o�N��ɼ�$������5$��aI�+Sޖ�2��ґ��2�m�-S�ް������r_���w�^XGֽܡ�҃����M\��-���F؂厕���-X��i�ZX�r�$��N1-@�$X�������ar� �]�l��������?!�dU����;BҠ�o ȼ����)Za�o

而不是当我将上述标头发送到站点时,或者当我按住 Control 键单击代码中的 Http 地址时得到的输出。值得一提的是,我已经成功地使用了上述一般方法从多个其他网站下载数据。我广泛地使用了上面的代码,以确定哪些行可以消除,并且仍然可以成功下载。但是,我要么一无所获,要么得到上面的结果。

任何想法,这里发生了什么,或者任何处理问题的建议方法?

vb.net httpwebrequest

评论

0赞 tkausl 10/13/2023
它可能被压缩了,正如你特别接受的那样。gzip, deflate, br
0赞 Jimi 10/13/2023
postReq.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8") postReq.AutomaticDecompression = DecompressionMethods.GZip Or DecompressionMethods.Deflate-- 顺便说一句,不支持 -- 这不是设置 keep-alive () 或 Cookie 的方法。不确定你要得到什么brpostReq.KeepAlive = True
0赞 someotherguy 10/13/2023
“postReq.AutomaticDecompression = DecompressionMethods.GZip”做到了。问题解决了。非常感谢!

答: 暂无答案