使用 PowerShell 在网页内容中查找 URL

find url in web page content using powershell

提问人:Ashar 提问时间:9/6/2023 最后编辑:VLAZAshar 更新时间:9/6/2023 访问量:53

问:

我需要使用 powershell 从 https://www.windwardstudios.com/version/version-downloads 中搜索 https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip 网址。

因此,我需要https:\\<anything>\JavaRESTfulEngine<anything>.zip

首先,我尝试了哪种方法并给了我所需的 URL$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip'

为了进一步概括,我尝试过,但现在它不起作用。$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'

下面是我的powershell脚本。

# URL of the website to scrape

$websiteUrl = https://www.windwardstudios.com/version/version-downloads

# Use Invoke-WebRequest to fetch the web page content

$response = Invoke-WebRequest -Uri $websiteUrl

# Check if the request was successful

if ($response.StatusCode -eq 200) {

    # Parse the HTML content to find the zip file URL using a regular expression

    $htmlContent = $response.Content

    $regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'

    $zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }

    if ($zipFileUrls.Count -gt 0) {

        Write-Host "Found zip file URLs:"

        $zipFileUrls | ForEach-Object { Write-Host $_ }

    } else {

        Write-Host "Zip file URLs not found on the page."

    }

} else {

    Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"

}

输出:

Zip file URLs not found on the page.

期望输出:

https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip

你能提出建议吗?

正则表达式 PowerShell 字符串匹配 Web 内容

评论

0赞 Wiktor Stribiżew 9/6/2023
有一个你放的地方.试试/[^/]*https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip
0赞 Ashar 9/6/2023
@WiktorStribiżew它起作用了。如果我也愿意;我试过这个,它奏效了.请张贴您的答案,以便我接受。cdn\.windwardstudios\.com/Archivehttps://(\S+?)/JavaRESTfulEngine-.*?\.zip

答:

1赞 Wiktor Stribiżew 9/6/2023 #1

你可以使用

https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip

请参阅正则表达式演示

细节

  • https://cdn\.windwardstudios\.com/Archive/- 文字字符串https://cdn.windwardstudios.com/Archive/
  • (\S+?)- 第 1 组:一个或多个非空格字符,尽可能少
  • /JavaRESTfulEngine-- 文字字符串/JavaRESTfulEngine-
  • .*?- 除换行符外的任何零个或多个字符尽可能少
  • \.zip- 一个字符串。.zip