删除字符串中双引号内的非同数字字符 - Python

Remove non alpanumberic characters within doublequotes in a string - Python

提问人:usr_lal123 提问时间:10/20/2023 更新时间:10/20/2023 访问量:142

问:

我的输入文本如下所示:

answer_result = I don’t think we’ll still be doing "prompt engineering" in five years "i.e." figuring out how to hack the prompt by " ," adding one magic word to the "" end that changes everything  else. " What will always matter is the "1" quality of ideas.

我需要删除双引号中的所有非字母数字字符。我需要保留“prompt engineering”、“i.e.”、“1”。所有其他都需要删除。

我的预期输出:

I don’t think we’ll still be doing "prompt engineering" in five years "i.e." figuring out how to hack the prompt by adding one magic word to the end that changes everything  else. What will always matter is the "1" quality of ideas.

我尝试了以下代码来获取双引号的位置:

import re
double_quotes_locs = [m.start() for m in re.finditer('"', answer_result)]
to_be_deleted = []
single_quotes = []
for s in range(0,len(double_quotes_locs),2):
    try:
        if (double_quotes_locs[s+1] - double_quotes_locs[s]) <= 1:
            to_be_deleted.append((double_quotes_locs[s],double_quotes_locs[s+1]))
            continue
        else:          
            if re.match("^[A-Za-z0-9]+", answer_result[double_quotes_locs[s]+1:double_quotes_locs[s+1]]):
                continue
            else:
                to_be_deleted.append((double_quotes_locs[s],double_quotes_locs[s+1]))
    except IndexError:
        single_quotes.append(double_quotes_locs[s])
        break

有什么可以帮助我如何进一步进行吗?

甚至欢迎这个问题的新解决方案。

谢谢

python-3.x 正则表达式

评论


答:

0赞 Tranbi 10/20/2023 #1

您可以使用:re.sub

import re

answer_result = '''I don’t think we’ll still be doing "prompt engineering" in five years "i.e." figuring out how to hack the prompt by " ," adding one magic word to the "" end that changes everything  else. " What will always matter is the "1" quality of ideas.'''

res = re.sub(r'"[^A-Za-z0-9]*"', '', answer_result)

输出:

I don’t think we’ll still be doing "prompt engineering" in five years "i.e." figuring out how to hack the prompt by  adding one magic word to the  end that changes everything  else. " What will always matter is the "1" quality of ideas.

然后,您可以再次使用,将多个连续的空格替换为单个空格。re.sub

评论

1赞 mandy8055 10/20/2023
它是否产生预期的输出?
0赞 Tranbi 10/20/2023
@mandy8055我认为它正在回答“删除字符串中双引号内的非同数字字符”的问题。
0赞 usr_lal123 10/20/2023
否则,我需要删除一个双引号。太
0赞 usr_lal123 10/20/2023
此外,该解决方案仅适用于包含在“”中的文本,这不是我的情况。我的文字将永远在”
0赞 Tranbi 10/20/2023
只要字符串中没有单引号,它就可以工作。关于您的孤儿报价,这不是您可以使用正则表达式做的事情。你可以遍历你的字符串,并使用堆栈来检查你的报价是否平衡......但是您必须找到一个标准来识别孤儿报价。