提问人:anna 提问时间:11/14/2023 最后编辑:anna 更新时间:11/14/2023 访问量:41
大查询将字符串拆分为最常用的单词
Big Query Split String into Most Frequent Words
问:
我正在尝试查找大查询列中出现频率最高的单词。(产品描述栏)
有没有办法走得更远,找到“刀”这个词后面最常出现的词?(在产品描述栏中)
我正在尝试隔离仅包含锋利、危险刀具的产品描述(不包括万圣节刀具、刀块、刀盘、刀具收纳盒等)
答:
-1赞
Println
11/14/2023
#1
试试下面的查询: 只需将示例字符串替换为column_name,然后在exclude_words添加需要排除的关键字即可。
with before_knives as (
select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knives') as words
),
before_knives_words AS (
SELECT vals
FROM before_knives, UNNEST(before_knives.words) AS vals
),
after_knives as (
select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knives (\w+)') as words
),
after_knives_words AS (
SELECT vals
FROM after_knives, UNNEST(after_knives.words) AS vals
),
before_knife as (
select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knife') as words
),
before_knife_words AS (
SELECT vals
FROM before_knife, UNNEST(before_knife.words) AS vals
),
after_knife as (
select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knife (\w+)') as words
),
after_knife_words AS (
SELECT vals
FROM after_knife, UNNEST(after_knife.words) AS vals
),
union_all as (
select * from before_knives_words
union all
select * from after_knives_words
union all
select * from before_knife_words
union all
select * from after_knife_words
),
exclude_words as (
select * from union_all where
vals not in ('chef','stage')
)
select vals,count(*) from exclude_words group by vals
评论
0赞
anna
11/14/2023
谢谢!列名为 pkg_desc。当我用pkg_desc而不是“SHARPAL”尝试上述操作时,我收到错误“无法识别的名称”。我是否需要在使用 pkg_desc 的原始表中包含 from 子句?
0赞
Println
11/14/2023
是的,像这样的东西,选择REGEXP_EXTRACT_ALL(LOWER(pkg_desc),r'(\w+) knives')作为table_name @anna中的单词
评论