提问人:DBA_player 提问时间:10/16/2023 更新时间:10/19/2023 访问量:59
SQL 正则表达式组
SQL RegEx Group
问:
我在使用 Hive 时遇到问题。问题似乎围绕着分组。我基本上是从字符之间的 URL 中提取信息,我想选择的不是第一个匹配项,而是第 N 个匹配项。regexp_extract
/
URL 可能如下所示:
https://stackoverflow.com/questions/dghahjsg/ahfgajhfg/shjagdhjasg/text/regex-match-all-characters-between-two-strings
我想提取零件。text
例如,我正在尝试:
regexp_extract(cs.context_page_url, '(\/.?\/)', 5)
答:
0赞
leftjoin
10/16/2023
#1
您可以使用 use split(),它也是基于正则表达式的。
演示
select split(str, '/+')[6]
from
(select 'https://stackoverflow.com/questions/dghahjsg/ahfgajhfg/shjagdhjasg/text/regex-match-all-characters-between-two-strings' str
)s
返回:
_c0
----
text
也不是正则表达式中的特殊字符,不需要转义/
/+
表示一个或多个 。/
如果删除,则位置将为第 7 个,因为后面有两个斜杠,数组索引从 0 开始+
text
https:
0赞
Ryszard Czech
10/19/2023
#2
用REGEXP_EXTRACT
REGEXP_EXTRACT(URL,'^(?:[^/]*/){7}([^/]*)', 1)
描述:
^ asserts position at start of a line
Non-capturing group (?:[^/]*/){7}
- {7} matches the previous token exactly 7 times
Match a single character not present in the list below [^/]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/ matches the character /
/ matches the character /
1st Capturing Group ([^/]*)
Match a single character not present in the list below [^/]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/ matches the character /
上一个:使用查询中定义的别名时找不到列
评论
regexp_extract(URL,'^(?:[^/]*/){7}([^/]*)', 1)