如何在 Python 中设置段落句子的字数限制?

how set a word limit in paragraph's sentences in python?

提问人:waji 提问时间:9/7/2022 最后编辑:Wiktor Stribiżewwaji 更新时间:9/7/2022 访问量:45

问:

在列表中追加时需要设置限制。

sent = 'Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming.'

我只需要在一个句子中设置 5 个单词并附加到列表中

输出应 -

sent_list = ['Python is dynamically-typed and garbage-collected.', 'It supports multiple programming paradigms,', 'including structured (particularly procedural), object-oriented', 'and functional programming.']
Python 列表 拆分 切片

评论


答:

1赞 user16004728 9/7/2022 #1

试试这个:

words = sent.split(' ')
sent_list = [' '.join(words[n:n+5]) for n in range(0, len(words), 5)]
0赞 uozcan12 9/7/2022 #2
sent = 'Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming.'
sent_list = ['Python is dynamically-typed and garbage-collected.', 
            'It supports multiple programming paradigms,', 
            'including structured (particularly procedural), object-oriented', 
            'and functional programming.']

new_list = []
inner_string = ""
sentence_list = sent.split(" ")
for idx, item in enumerate(sentence_list):
    if (idx+1)==1 or (idx+1)%5 != 0:
        if (idx+1) == len(sentence_list):
            inner_string += item
            new_list.append(inner_string)
        else:
            inner_string += item + " "
    elif (idx+1)!=1 and (idx+1) % 5 == 0 :
        inner_string += item
        new_list.append(inner_string)
        inner_string = ""
        
print(new_list)
print(new_list == sent_list)

输出:

['Python is dynamically-typed and garbage-collected.', 'It supports multiple programming paradigms,', 'including structured (particularly procedural), object-oriented', 'and functional programming.']
True
1赞 ouroboros1 9/7/2022 #3

也许有点非正统:

sent_list = [re.sub(r'\s$','',x.group('pattern')) for x in 
     re.finditer('(?P<pattern>([^\s]+\s){5}|.+$)',sent)]

['Python is dynamically-typed and garbage-collected.',
 'It supports multiple programming paradigms,',
 'including structured (particularly procedural), object-oriented',
 'and functional programming.']

解释:'(?P<pattern>([^\s]+\s){5}|.+$)'

  • (?P<pattern> ... ):修饰,用于创建命名的捕获组。
  • ([^\s]+\s){5}:查找非空格字符(一个或多个)的序列,后跟一个空格;然后重复 5 次。
  • |.+$:一旦第一个选项用尽,只需将最后一位完成即可。

我们使用 re.finditer 遍历所有 并用 .除最后一场比赛外,所有比赛最后都会有一个额外的空格;摆脱它的一种方法是使用 re.submatch objectsx.group('pattern')