大查询将字符串拆分为最常用的单词

Big Query Split String into Most Frequent Words

提问人:anna 提问时间:11/14/2023 最后编辑:anna 更新时间:11/14/2023 访问量:41

问:

我正在尝试查找大查询列中出现频率最高的单词。(产品描述栏)

有没有办法走得更远,找到“刀”这个词后面最常出现的词?(在产品描述栏中)

我正在尝试隔离仅包含锋利、危险刀具的产品描述(不包括万圣节刀具、刀块、刀盘、刀具收纳盒等)

https://docs.google.com/spreadsheets/d/1c_XLVA2gh7i3BFIsIyg3qAtcdXDY46QomFK6u-nB08E/edit#gid=350499651

string google-sheets google-bigquery 频率 词频

评论


答:

-1赞 Println 11/14/2023 #1

试试下面的查询: 只需将示例字符串替换为column_name,然后在exclude_words添加需要排除的关键字即可。

    with before_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knives') as words
      ),
      before_knives_words AS (
         SELECT vals
           FROM before_knives, UNNEST(before_knives.words) AS vals
    ),
    after_knives as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knives (\w+)') as words
      ),
      after_knives_words AS (
         SELECT vals
           FROM after_knives, UNNEST(after_knives.words) AS vals
    ),
    before_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Knives Tool Helps Repair and Restore Blades'),r'(\w+) knife') as words
      ),
      before_knife_words AS (
         SELECT vals
           FROM before_knife, UNNEST(before_knife.words) AS vals
    ),
    after_knife as (
      select REGEXP_EXTRACT_ALL(LOWER('SHARPAL 191H Pocket Kitchen Chef Knife Scissors Sharpener for Straight & Serrated Knives, 3-Stage Knife Sharpening Tool Helps Repair and Restore Blades'),r'knife (\w+)') as words
      ),
      after_knife_words AS (
         SELECT vals
           FROM after_knife, UNNEST(after_knife.words) AS vals
    ),
    union_all as (
      select * from before_knives_words
    union all
    select * from after_knives_words
    union all 
    select * from before_knife_words
    union all
    select * from after_knife_words
    
    ),
exclude_words as (
  select * from union_all where 
  vals not in ('chef','stage')
)
select vals,count(*) from exclude_words group by vals

评论

0赞 anna 11/14/2023
谢谢!列名为 pkg_desc。当我用pkg_desc而不是“SHARPAL”尝试上述操作时,我收到错误“无法识别的名称”。我是否需要在使用 pkg_desc 的原始表中包含 from 子句?
0赞 Println 11/14/2023
是的,像这样的东西,选择REGEXP_EXTRACT_ALL(LOWER(pkg_desc),r'(\w+) knives')作为table_name @anna中的单词