Perl 正则表达式删除引号之间的逗号?

Perl Regex To Remove Commas Between Quotes?

提问人:Duncan 提问时间:7/1/2015 更新时间:7/1/2015 访问量:1948

问:

我正在尝试删除字符串中双引号之间的逗号,同时保持其他逗号不变?(这是一个电子邮件地址,有时包含备用逗号)。以下“蛮力”代码在我的特定机器上工作正常,但是有没有更优雅的方法可以做到这一点,也许使用单个正则表达式? 邓肯

$string = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <[email protected]>,1,2';
print "Initial string = ", $string, "<br>\n";

# Extract stuff between the quotes
$string =~ /\"(.*?)\"/;

$name = $1;
print "name = ", $1, "<br>\n";
# Delete all commas between the quotes
$name =~ s/,//g;
print "name minus commas = ", $name, "<br>\n";
# Put the modified name back between the quotes
$string =~ s/\"(.*?)\"/\"$name\"/;
print "new string = ", $string, "<br>\n";
Regex Perl 行情

评论


答:

3赞 Casimir et Hippolyte 7/1/2015 #1

您可以使用以下模式:

$string =~ s/(?:\G(?!\A)|[^"]*")[^",]*\K(?:,|"(*SKIP)(*FAIL))//g;

图案细节:

(?: # two possible beginnings:
    \G(?!\A) # contiguous to the previous match
  |          # OR
    [^"]*"   # all characters until an opening quote
)
[^",]*     #"# all that is not a quote or a comma
\K           # discard all previous characters from the match result
(?:          # two possible cases:
    ,        # a comma is found, so it will be replaced
  |          # OR
    "(*SKIP)(*FAIL) #"# when the closing quote is reached, make the pattern fail
                      # and force the regex engine to not retry previous positions.
)

如果您使用较旧的 perl 版本,并且可能不支持回溯控制谓词。在这种情况下,您可以将此模式用于捕获组:\K

$string =~ s/((?:\G(?!\A)|[^"]*")[^",]*)(?:,|("[^"]*(?:"|\z)))/$1$2/g;

评论

0赞 Duncan 7/1/2015
我试过了,但收到以下错误消息:Quantifier 在正则表达式中不遵循任何内容;以 <-- HERE 标记,以 m/(?:\G(?!\A)|[^“]*”)[^“,]*K(?:,|”(* <-- 这里跳过)(*F))
0赞 Casimir et Hippolyte 7/1/2015
@Duncan:如果不支持回溯控制动词,您可以使用:$string =~ s/(?:\G(?!\A)|[^"]*")[^",]*(?:\K,|("[^"]*(?:"|\z)))/$1/g;
1赞 ThisSuitIsBlackNot 7/1/2015
@Duncan直到 Perl 5.10 才被添加,所以这行不通。您可以使用捕获组作为替代方法。\K
1赞 Casimir et Hippolyte 10/28/2021
@JaredStill:使用评估的替换模式(带有 e 标志)的另一种方式:其中是完全匹配和(或)删除所选字符。s/".*?"/$_=$&;y|,||d;$_/ge$&y///dtr///d
1赞 Casimir et Hippolyte 10/28/2021
@JaredStill:或更短的替换,带有第二个替换:(使用r标志,返回替换结果,只读保持不变)。s/".*?"/$&=~s|,||gr/ge$&
2赞 TLP 7/1/2015 #2

一种方法是使用 nice 模块 Text::P arseWords 来隔离特定字段并执行简单的音译以去除逗号:

use strict;
use warnings;
use Text::ParseWords;

my $str = '06/14/2015,19:13:51,"Mrs, Nkoli,,,ka N,ebedo,,m" <[email protected]>,1,2';
my @row = quotewords(',', 1, $str);
$row[2] =~ tr/,//d;
print join ",", @row;

输出:

06/14/2015,19:13:51,"Mrs Nkolika Nebedom" <[email protected]>,1,2

我假设您的电子邮件字段中不能合法地出现逗号。否则,需要一些其他的替换方法。