提问人:BenyaminDev 提问时间:9/25/2023 最后编辑:Theodor ZouliasBenyaminDev 更新时间:9/27/2023 访问量:81
同时运行多个 Selenium 任务并防止系统崩溃
Running many Selenium tasks concurrently and preventing system crashes
问:
我将通过 Selenium 和 HtmlAgilityPack(在 C#、.NET 7 中)从网站获取每个国家/地区的人口排名。此代码适用于 10 个国家/地区,但是当我想请求所有国家/地区时,由于许多任务,系统崩溃并且我遇到缓慢。这是什么方法?
static async void GetData()
{
string website = "......";
List<string> Countries = new List<string>()
{
// 195 Countries
};
List<Task<JObject>> Tasks = new List<Task<JObject>>();
foreach (string countryName in Countries)
{
Tasks.Add(FetchData(website + "/" + countryName));
}
await Task.WhenAll(Tasks);
foreach (JObject populationRank in Tasks.Select(task => task.Result))
{
WriteLine(populationRank);
}
}
static Task<JObject> FetchData(string URL)
{
return Task.Run(async () =>
{
ChromeDriver myDriver = new CustomizedDriver();
myDriver.Navigate().GoToUrl(URL);
HtmlDocument Document = new HtmlDocument();
Document.LoadHtml(await myDriver.GetPageSourceAsync());
JObject Object = new JObject()
{
["PopulationRank"] = Document.DocumentNode.SelectSingleNode("//div[@id='popRank']").InnerText
};
myDriver.Quit();
myDriver.Dispose();
return Object;
});
}
static Task<string> GetPageSourceAsync(this IWebDriver driver)
{
return Task.Run(() =>
{
while (true)
{
string PageState = (string)( (IJavaScriptExecutor)driver ).ExecuteScript("return document.readyState");
if (PageState == "interactive" || PageState == "complete")
return driver.PageSource;
}
});
}
static ChromeDriver CustomizedDriver()
{
ChromeDriverService chromeService = ChromeDriverService.CreateDefaultService();
chromeService.HideCommandPromptWindow = true;
ChromeOptions options = new ChromeOptions();
options.PageLoadStrategy = PageLoadStrategy.None;
options.AddArgument("--headless --disable-cookies --blink-settings=imagesEnabled=false");
return new ChromeDriver(chromeService, options);
}
答:
0赞
sadbuttrue
9/25/2023
#1
您不应一次创建大量任务。相反,应使用 Parallel.ForEachAsync,如以下示例所示:
static async void GetData()
{
string website = "......";
List<string> Countries = new List<string>()
{
// 195 Countries
};
var results = new ConcurrentBag<JObject>();
var parallelOptions = new ParallelOptions()
{
MaxDegreeOfParallelism = 10 // Here you control how many countries in parallel to process.
};
Parallel.ForEachAsync(Countries, parallelOptions, async (country, token) =>
{
var result = await FetchData(website + "/" + countryName);
results.Add(result);
}
foreach (JObject populationRank in results))
{
WriteLine(populationRank);
}
}
static async Task<JObject> FetchData(string URL)
{
ChromeDriver myDriver = new CustomizedDriver();
myDriver.Navigate().GoToUrl(URL);
HtmlDocument Document = new HtmlDocument();
Document.LoadHtml(await myDriver.GetPageSourceAsync());
JObject Object = new JObject()
{
["PopulationRank"] = Document.DocumentNode.SelectSingleNode("//div[@id='popRank']").InnerText
};
myDriver.Quit();
myDriver.Dispose();
return Object;
}
static Task<string> GetPageSourceAsync(this IWebDriver driver)
{
return Task.Run(() =>
{
while (true)
{
string PageState = (string)( (IJavaScriptExecutor)driver ).ExecuteScript("return document.readyState");
if (PageState == "interactive" || PageState == "complete")
return driver.PageSource;
}
});
}
static ChromeDriver CustomizedDriver()
{
ChromeDriverService chromeService = ChromeDriverService.CreateDefaultService();
chromeService.HideCommandPromptWindow = true;
ChromeOptions options = new ChromeOptions();
options.PageLoadStrategy = PageLoadStrategy.None;
options.AddArgument("--headless --disable-cookies --blink-settings=imagesEnabled=false");
return new ChromeDriver(chromeService, options);
}
评论
0赞
BenyaminDev
9/27/2023
您的方法响应迅速,但不幸的是,它需要 10 分钟,并且系统有点滞后。这可以分为几个线程吗?以某种方式在所有 CPU 内核上实现?
0赞
sadbuttrue
9/27/2023
可以通过递增 MaxDegreesOfParalelism 变量来增加线程数。
评论
GetData
Parallel.ForEachAsync
API?Parallel.ForEach
Parallel.ForEachAsync
MaxDegreeOfParallelism