提问人:FozenOption 提问时间:10/1/2022 更新时间:10/2/2022 访问量:109
用 python 一次替换多个模式
Replace multiple patterns once at a time with python
问:
所以我想做的基本上是我有一个具有多个参数的 URL 列表,例如:
https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2
我想得到的是这样的:
https://www.somesite.com/path/path2/path3?param1=PAYLOAD¶m2=value2
https://www.somesite.com/path/path2/path3?param1=value1¶m2=PAYLOAD
就像我想遍历每个参数(基本上是“=”和“&”的每个匹配项)并每次替换一个值。先谢谢你。
答:
0赞
Eftal Gezer
10/1/2022
#1
from urllib.parse import urlparse
import re
urls = ["https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2¶m3=value3",
"https://www.anothersite.com/path/path2/path3?param1=value1¶m2=value2¶m3=value3"]
parseds = [urlparse(url) for url in urls]
newurls = []
for parsed in parseds:
params = parsed[4].split("&")
for i, param in enumerate(params):
newparam = re.sub("=.+", "=PAYLOAD", param)
newurls.append(
parsed[0] +
"://" +
parsed[1] +
parsed[2] +
"?" +
parsed[4].replace(param, newparam)
)
newurls
是
['https://www.somesite.com/path/path2/path3?param1=PAYLOAD¶m2=value2¶m3=value3',
'https://www.somesite.com/path/path2/path3?param1=value1¶m2=PAYLOAD¶m3=value3',
'https://www.somesite.com/path/path2/path3?param1=value1¶m2=value2¶m3=PAYLOAD',
'https://www.anothersite.com/path/path2/path3?param1=PAYLOAD¶m2=value2¶m3=value3',
'https://www.anothersite.com/path/path2/path3?param1=value1¶m2=PAYLOAD¶m3=value3',
'https://www.anothersite.com/path/path2/path3?param1=value1¶m2=value2¶m3=PAYLOAD']
评论
0赞
Eftal Gezer
10/1/2022
@FozenOption 如果顺序很重要,我们可以通过正则表达式对参数进行排序。
0赞
FozenOption
10/1/2022
这不知何故一次只接受两个参数,比如如果有 param3,它会给我 param1 和 param2 或 param1 和 param3 或 param2 和 param3,它省略了第 3 个参数,您建议进行哪些更改
0赞
FozenOption
10/2/2022
#2
我已经解决了:
from urllib.parse import urlparse
url = "https://github.com/search?p=2&q=user&type=Code&name=djalel"
parsed = urlparse(url)
query = parsed.query
params = query.split("&")
new_query = []
for param in params:
l = params.index(param)
param = str(param.split("=")[0]) + "=" + "PAYLOAD"
params[l] = param
new_query.append("&".join(params))
params = query.split("&")
for query in new_query:
print(str(parsed.scheme) + '://' + str(parsed.netloc) + str(parsed.path) + '?' + query)
输出:
https://github.com/search?p=PAYLOAD&q=user&type=Code&name=djalel
https://github.com/search?p=2&q=PAYLOAD&type=Code&name=djalel
https://github.com/search?p=2&q=user&type=PAYLOAD&name=djalel
https://github.com/search?p=2&q=user&type=Code&name=PAYLOAD
评论
urllib