SQL 正则表达式组

SQL RegEx Group

提问人:DBA_player 提问时间:10/16/2023 更新时间:10/19/2023 访问量:59

问:

我在使用 Hive 时遇到问题。问题似乎围绕着分组。我基本上是从字符之间的 URL 中提取信息,我想选择的不是第一个匹配项,而是第 N 个匹配项。regexp_extract/

URL 可能如下所示:

https://stackoverflow.com/questions/dghahjsg/ahfgajhfg/shjagdhjasg/text/regex-match-all-characters-between-two-strings

我想提取零件。text

例如,我正在尝试:

regexp_extract(cs.context_page_url, '(\/.?\/)', 5)
SQL 正则表达式 配置单元

评论

0赞 Wiktor Stribiżew 10/16/2023
我想会的。regexp_extract(URL,'^(?:[^/]*/){7}([^/]*)', 1)

答:

0赞 leftjoin 10/16/2023 #1

您可以使用 use split(),它也是基于正则表达式的。

演示

select split(str, '/+')[6] 
from
(select 'https://stackoverflow.com/questions/dghahjsg/ahfgajhfg/shjagdhjasg/text/regex-match-all-characters-between-two-strings' str
)s

返回:

_c0
----
text

也不是正则表达式中的特殊字符,不需要转义/

/+表示一个或多个 。/

如果删除,则位置将为第 7 个,因为后面有两个斜杠,数组索引从 0 开始+texthttps:

0赞 Ryszard Czech 10/19/2023 #2

REGEXP_EXTRACT

REGEXP_EXTRACT(URL,'^(?:[^/]*/){7}([^/]*)', 1)

正则表达式证明

描述:

^ asserts position at start of a line
Non-capturing group (?:[^/]*/){7}
  - {7} matches the previous token exactly 7 times
   Match a single character not present in the list below [^/]
   * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   / matches the character / 
/ matches the character /
1st Capturing Group ([^/]*)
  Match a single character not present in the list below [^/]
  * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  / matches the character /