如何遍历字符串列表(即一个句子)并检查它是否包含以特定单词开头的单词?

How to iterate through a list of strings (i.e a sentence) and check if it has a word starts with a specific word in it?

提问人:Milind 提问时间:11/17/2023 最后编辑:toyota SupraMilind 更新时间:11/17/2023 访问量:46

问:

所以问题是创建一个 for 循环,它接受一个文档列表(每个文档都是一个字符串),它应该返回一个以特定关键字开头的字符串列表

示例(结果应为):['python', 'python_one', 'python_two', 'python_three', 'python', 'python_two']

doc_list = ['welcome to python tutorial', 'you are welcome to python_one tutorial', 'python_two tutorial python_three tutorial', 'hello python', 'hello python_two world']

关键字 = python

import re
doc_list = ['welcome to python tutorial', 'you are welcome to python_one tutorial',  'welcome python_two tutorial python_three tutorial', 'hello python', 'hello python_four world']

output = []
for line in doc_list:
    y = re.search(r' (python.*?) ', line)
    if y: output.append(y.group(1))
    
print(output)

我得到的结果是

['python', 'python_one', 'python_two', 'python_four']

这里缺少python_three,因为我的代码忽略了每个句子字符串中 python 的多次出现。

列表 python-re

评论

0赞 Codist 11/17/2023
str 对象有一个内置的 startswith() 函数,您会发现它很有用。你真的不需要为此 re
0赞 Abdul Aziz Barkat 11/17/2023
这回答了你的问题吗?如何在 Python 中找到与正则表达式的所有匹配项?另外: 如何找到以特定字符开头的单词

答:

0赞 Codist 11/17/2023 #1

保持简单。遍历doc_list,将每个元素(句子)拆分为其组成单词,然后使用 startswith() 检查每个单词。

doc_list = ['welcome to python tutorial', 'you are welcome to python_one tutorial',  'welcome python_two tutorial python_three tutorial', 'hello python', 'hello python_four world']
result = []
for sentence in doc_list:
    for word in sentence.split():
        if word.startswith("python"):
            result.append(word)
print(result)

输出:

['python', 'python_one', 'python_two', 'python_three', 'python', 'python_four']
0赞 Andrej Kesely 11/17/2023 #2

另一种解决方案:

import re

doc_list = [
    "welcome to python tutorial",
    "you are welcome to python_one tutorial",
    "welcome python_two tutorial python_three tutorial",
    "hello python",
    "hello python_two world",
]

out = []
for doc in doc_list:
    out.extend(re.findall(r"\bpython\S*", doc))

print(out)

指纹:

[
  "python", 
  "python_one", 
  "python_two", 
  "python_three", 
  "python", 
  "python_two"
]