如何使用htmlunit记录渲染页面期间触发的所有请求?

How to use htmlunit to record all requests fired during rendering a page?

提问人:defalt1996 提问时间:4/23/2023 更新时间:4/30/2023 访问量:98

问:

我正在使用 HTMLUnit 尝试记录加载本地 html 文件时触发的所有请求。 这是下面的测试文件:

<script type="text/javascript">
  !(function () {
    var adc = function (str) {
      return decodeURIComponent(escape(window.atob(str)));
    };
    document.write(
      adc(
        "PGEgaHJlZj0iaHR0cHM6Ly9kc3A4dTRqaGE0NHJyLmNsb3VkZnJvbnQubmV0L2RpcmVjdC8xOTgyMzQxMDE/YWR4PUFsZ29yaXgoUHJvKSZhcHA9MjgxOTQwMjkyJnByaWNlPTAuOTEwMSZyZD1MVldrTUJRTUxGQzhrIiB0YXJnZXQ9Il9ibGFuayI+PGltZyBzcmM9Imh0dHBzOi8vZHNwOHU0amhhNDRyci5jbG91ZGZyb250Lm5ldC9pbXAvMTk4MjM0MTAxP2FkeD1BbGdvcml4KFBybykmYXBwPTI4MTk0MDI5MiZwcmljZT0wLjkxMDEmcmQ9TFZXa01CUU1MRkM4ayIgd2lkdGg9IjMyMCIgaGVpZ2h0PSI1MCI+PC9hPjxpbWcgc3JjPSJodHRwczovL2QybWsybmg4dmZmNzY4LmNsb3VkZnJvbnQubmV0L3YxL3BpeGVsP2E9MTAyOSZiPTEwNDYmYz0xJmQ9ZmZkNDkyNTFjYjkwOGI5NSZlPTU4Njg0YWYzN2Q0MTFhNGImZj0wLjkxMDEmZz0wLjkxMDE0Jmg9MTA0NSZpPWZiMTJmZjY5ZjJkZmFiZjAmaz04MzM1NTEzMTAxMDI5ODI2NjAmcmQ9TFZXa01CUU1MRkM4ayIgYm9yZGVyPSIwIiB3aWR0aD0iMSIgaGVpZ2h0PSIxIi8+"
      )
    );
    document.write(
      adc(
        "PGltZyBzcmM9Imh0dHBzOi8vdXNlLnRyay5zdnItYWxnb3JpeC5jb20vaW1wP2NycHY9MyZpbmZvPTlFbVpwWkNNdWdETnVFak14NHlNeTBEY3BWbkp5a2pNd1FUT3hnak05UVdkaVpDTTlRSGR5Tm5KeDBEZDBsbVltRVRQdEJuWW1Bek45STNjeU5uSngwVGJtQm5KdzBEYzRWbUp3MFRhd0ZtSngwVGUwRm1KelFUTndZVFBrbDJjbWdUTTVJak41RURPMkVUUDBKbkp4a2pMdzBUYmhaU000TWpOdUFUUHRaU013RVRPdUFUUHRKbUp3a3pNOWtuWW1FMFVWMXpZbUV6TTJRek54MERjbVEyTmhKV04xUWpNelUyTTNnell3Z1RZalZHTndVR09rSlRZNVVUTW1oVFo5RW5jJnByaWNlPSR7QVVDVElPTl9QUklDRX0mcz02MDU0MyZyPWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkIiB3aWR0aD0iMSIgaGVpZ2h0PSIxIiBzdHlsZT0iZGlzcGxheTpub25lOyI+PGRpdiBpZD0iZG9qczIwMTJiMDVhIiBkYXRhLXdpZHRoPSIzMjAiIGRhdGEtaGVpZ2h0PSI1MCIgZGF0YS10cms9J2h0dHBzOi8vdXNlLnRyay5zdnItYWxnb3JpeC5jb20vaW1wP2NycHY9MyZpbmZvPTlFbVpwWkNNdWdETnVFak14NHlNeTBEY3BWbkp5a2pNd1FUT3hnak05UVdkaVpDTTlRSGR5Tm5KeDBEZDBsbVltRVRQdEJuWW1Bek45STNjeU5uSngwVGJtQm5KdzBEYzRWbUp3MFRhd0ZtSngwVGUwRm1KelFUTndZVFBrbDJjbWdUTTVJak41RURPMkVUUDBKbkp4a2pMdzBUYmhaU000TWpOdUFUUHRaU013RVRPdUFUUHRKbUp3a3pNOWtuWW1FMFVWMXpZbUV6TTJRek54MERjbVEyTmhKV04xUWpNelUyTTNnell3Z1RZalZHTndVR09rSlRZNVVUTW1oVFo5RW5jJnByaWNlPSR7QVVDVElPTl9QUklDRX0mcz02MDU0MyZyPWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkJyBkYXRhLWlkPSdBbGdvcmlYLWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkJz48c2NyaXB0IHR5cGU9J3RleHQvamF2YXNjcmlwdCcgYXN5bmMgc3JjPSJodHRwczovL3Ryay5zdnItYWxnb3JpeC5jb20vc3RhdGljL200LmpzP3Q9OTM0NDIzIj48L3NjcmlwdD48L2Rpdj4="
      ).replace(new RegExp(adc("XCR7QVVDVElPTl9QUklDRX0="), "g"), "0.6381")
    );
  })();
</script>
<img
  src="https://use.trk.svr-algorix.com/win?crpv=3&info=9EmZpZCMugDNuEjMx4yMy0DcpVnJykjMwQTOxgjM9QWdiZCM9QHdyNnJx0Dd0lmYmETPtBnYmAzN9I3cyNnJx0TbmBnJw0Dc4VmJw0TawFmJx0Te0FmJzQTNwYTPkl2cmgTM5IjN5EDO2ETP0JnJxkjLw0TbhZSM4MjNuATPtZSMwETOuATPtJmJwkzM9knYmE0UV1zYmEzM2QzNx0DcmQ2NhJWN1QjMzU2M3gzYwgTYjVGNwUGOkJTY5UTMmhTZ9Enc&price=0.6381&s=60543&r=e8f159a2d8e04eca80c873e32455ba7d"
  width="1"
  height="1"
  style="display: none"
/>

在 Chrome 中呈现它时,显示 url 跟踪列表的“网络”选项卡:enter image description here

包括本地文件本身在内,共触发了 7 个请求。这是我期望在我的代码打印结果中看到的。

我的代码如下:

public class RenderHTML extends WebConnectionWrapper {

    static List<String> list = new ArrayList<String>();


    public RenderHTML(WebClient webClient) throws IllegalArgumentException {
        super(webClient);
    }

    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        // Log the URL of the request
        System.out.println(request.getUrl().toString());
        return super.getResponse(request);
    }



    public static void main(String[] args) throws IOException {
        try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
            // Wrap the client with the URLRecorder
            webClient.getOptions().setJavaScriptEnabled(true);
            webClient.waitForBackgroundJavaScriptStartingBefore(100_000);
            webClient.waitForBackgroundJavaScript(100_000);
            webClient.getOptions().setCssEnabled(true);
            webClient.getOptions().setRedirectEnabled(true);
            webClient.getOptions().setUseInsecureSSL(false);
            webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
            webClient.getOptions().setThrowExceptionOnScriptError(false);
            webClient.getCookieManager().setCookiesEnabled(true);
            webClient.setAjaxController(new AjaxController());
            webClient.getCookieManager().setCookiesEnabled(true);

            webClient.setWebConnection(new RenderHTML( webClient));

            // Load the local HTML file
            HtmlPage page = webClient.getPage("file:///Users/derrickguo/work/project/project_java/analyze_demand_tool_maven/src/main/lib/algorix_us_adm.html");
            
        }
    }
}

但它只打印:enter image description here

一个触发的请求,然后处理完成。

任何人都可以帮我如何获取所有已触发的请求?谢谢!

Java HTML HTML 解析 HTMLUnit

评论

0赞 Samuel Marchant 4/23/2023
通常,运行以模拟输出的程序是作为本地主机完成的,但是,这里的重点是 file:/// 完整路径可能需要是将生产者程序二进制文件与应用程序结构相关联的相对路径。可能需要将路径指定为相对于应用程序基文件夹或应用程序中页面源的特殊文件夹。
0赞 Rob 4/23/2023
请勿发布代码、数据、错误消息等图像。- 将文本复制或输入到问题中。如何询问
0赞 Rob 4/23/2023
注意:是标记中的默认值和不必要的。此外,<img> 标签不使用也不需要右斜杠,并且从未在任何 HTML 规范中使用过。type="text/javascript

答:

0赞 RBRi 4/30/2023 #1

HtmlUnit 是一个无头浏览器 - 默认情况下不会下载图像。 但你可以打开它

webClient.getOptions().setDownloadImages(true);

对 3.1.0 版进行了一些测试,我能够看到所有请求。

请记住,方法 waitForBackgroundJavaScriptStartingBefore() 和 waitForBackgroundJavaScript() 不是选项。您必须在获取页面或单击后致电他们(但在您的情况下不需要这样做)。

我的测试代码:

public class Issue76084456 extends WebConnectionWrapper {

    public Issue76084456(WebClient webClient) throws IllegalArgumentException {
        super(webClient);
    }

    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        // Log the URL of the request
        System.out.println("#######" + request.getUrl().toString());
        return super.getResponse(request);
    }

    public static void main(String[] args) throws IOException {
        try (WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
            // Wrap the client with the URLRecorder
            webClient.setWebConnection(new Issue76084456(webClient));

            webClient.getOptions().setDownloadImages(true);

            // Load the local HTML file
            HtmlPage page = webClient.getPage("file:///C:/RBRi/htmlunit/algorix_us_adm.html");
        }
    }
}