Perl 脚本将两个模式之间的行合并为一行并打印整个文件

Perl script to merge lines into a single line between two patterns and print the entire file

提问人:kshitij kulshreshtha 提问时间:10/23/2023 最后编辑:ddakshitij kulshreshtha 更新时间:10/24/2023 访问量:120

问:

我有一个CSV文件,如下所示:

罚款

Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def:
port_name
? scanRegister_name
? dataRegister_name
? one_hot_scan_group_name
? scanMux_name
? dataMux_name
? clockMux_name
? one_hot_data_group_name
? logicSignal_name
? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def:
instance_name
? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following
scanInPort_name
? shiftEnPort_name
? captureEnPort_name
? updateEnPort_name
? dataInPort_name
? selectPort_name
? resetPort_name
? tmsPort_name
? tckPort_name
? clockPort_name
? trstPort_name
? addressPort_name
? writeEnPort_name
? readEnPort_name
",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name
? dataOutPort_name
? toShiftEnPort_name
? toUpdateEnPort_name
? toCaptureEnPort_name
? toSelectPort_name
? toResetPort_name
? toTckPort_name
? toTmsPort_name
? toClockPort_name
? toTrstPort_name
? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),

输出文件:搜索模式 模式 1:模块语句 模式 2:6。

这是所需的结果 - 我们需要的输出文件:

输出文件 ####################################################################

Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name
? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name
? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name
? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name
? writeEnPort_name ? readEnPort_name ",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ?
toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name
? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name
? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),

我在下面创建了Perl脚本,但它没有给我预期的结果。

Perl 脚本

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;
use feature 'say';

my @collect;
my $file = $ARGV[0] or die;
open(my $DATA, '<', $file) or die;

while (<$DATA>) {
    chomp;
    # If we're between our markers...
    if (/Module Statements/ .. /6.4*/) {
        # At the start marker, empty the array
        if (/Module Statements/) {
            @collect = ();
        # At the end marker, print the array
        } elsif (/6.4*/) {
            say join ' ', @collect;
        # Otherwise, push the line onto the array
        } else {
            push @collect, $_;
            foreach my $m (@collect) {
                print $m;
            }
        }
    # Otherwise, just print the line
    } else {
        say;
    }
}

谢谢和问候 克希蒂·库尔什雷什塔

正则表达式 Perl

评论

0赞 sln 10/23/2023
在csv之前/之后有两种情况。至少需要一个或两个字段跨行的带引号的字段,或者该字段包含 csv 分隔符。这些是用户在构造字段时选择的。固定字段很少见,通常用于统计。如果目标是删除引号,则必须分析字段,然后删除换行符(如果需要),然后检查字段以查看它是否包含当前分隔符。如果是这样,您可以将分隔符更改为其他内容,或者必须引用它。我有软件可以通过单击按钮来执行此操作。
0赞 Cary Swoveland 10/24/2023
“输出文件”() 的第四行由行终止符窥视。我认为行终止符不应该在那里,在这种情况下,您应该将其删除。对于所有其他不以 开头的行也是如此。也就是说,我假设输出文件的图像应该与迄今为止所有三个答案中所示。"? one_hot_data_group_name ?...""Module Statements,"

答:

1赞 Vincent F 10/23/2023 #1

您的数据最终格式为 csv。由于没有直接的正则表达式解决方案来解决它,我建议您:

  1. 以包含 6 列的 csv (input.csv) 形式打开它,
  2. 然后拆下进料管,
  3. 将结果写入 csv (output.csv)

这是一个代码,它做到了这一点,并使用这个微小的正则表达式来替换 Windows 或 Unix 格式的馈送行:\r?\n

#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV;

# Specify the CSV input file and separator
my $csv_input_file = "input.csv";
my $separator = ",";

# Specify the CSV output file
my $csv_output_file = "output.csv";


# Function to read CSV with 6 columns
sub read_csv {
    my ($file, $separator) = @_;
    my $csv = Text::CSV->new({
        sep_char => $separator,
        binary   => 1,
        eol      => "\n",
        auto_diag => 1,
        strict   => 1,  # Enforce 6 columns per row
    });

    open my $fh, '<', $file or die "Could not open '$file': $!";
    my @rows;
    while (my $row = $csv->getline($fh)) {
        # Check if the row has exactly 6 columns
        if (@$row != 6) {
            die "Error: Row does not have 6 columns in file '$file': @$row";
        }
        push @rows, $row;
    }
    close $fh or die "Error closing '$file': $!";
    return \@rows;
}


# Function to remove line feeds
sub remove_line_feeds {
    my ($data) = @_;
    for my $row (@$data) {
        for my $field (@$row) {
            $field =~ s/\r?\n/ /g;
        }
    }
}

# Function to write CSV
sub write_csv {
    my ($file, $data, $separator) = @_;
    my $csv = Text::CSV->new({
        sep_char => $separator,
        binary   => 1,
        eol      => "\n",
    });

    open my $fh, '>', $file or die "Could not create '$file': $!";
    foreach my $row (@$data) {
        $csv->print($fh, $row);
    }
    close $fh or die "Error closing '$file': $!";
}

# Read the CSV data
my $csv_data = read_csv($csv_input_file, $separator);

# Remove line feeds from the CSV data
remove_line_feeds($csv_data);

# Write the modified CSV data to the output file
write_csv($csv_output_file, $csv_data, $separator);

print "Modified CSV data saved to '$csv_output_file'.\n";

输出.csv:

"Module Statements",Module,"Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal","A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.",6.4.5.a),
"Module Statements",,,"A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement",6.4.5.b),
"Module Statements",,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name ? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
"Module Statements",,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
"Module Statements",,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name ? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name ? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name ? writeEnPort_name ? readEnPort_name ",6.4.5.e),
"Module Statements",,,"An outputPort_name shall be one of the following: ? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ? toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name ? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name ? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
"Module Statements",,,"Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.",6.4.5.g),
"Module Statements",,,"Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.",6.4.5.h),

我希望输出.csv中的额外“对您来说是可以接受的。以及只有一行的格式。

1赞 Cary Swoveland 10/23/2023 #2

您似乎希望删除除前面的终止符之外的所有行终止符。为此,可以将以下正则表达式的匹配项替换为空格。"Module Statements,"

\R(?!Module Statements,)

演示

如链接所示,这些替换生成以下字符串。

Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name ? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name ? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name ? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name ? writeEnPort_name ? readEnPort_name ",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following: ? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ? toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name ? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name ? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),

正则表达式为“匹配行终止符 (),前提是它后面没有”模块语句”。 表示否定展望,这是必须满足的条件,但不是返回的匹配项的一部分。\R(?!...)

以下是 perl 中令牌的解释。\R


生成的字符串的每一行都以逗号结尾。如果在将字符串写入文件(以正常方式)时,当文件作为 CSV 文件读取时,行终止逗号表示不需要的空字段,则可以通过将匹配项替换为空字符串 () 来去除这些逗号。,(?=\R|\z)''

(?=\R|\z)是一个积极的展望,它断言逗号的匹配必须跟一个行终止符,或者 () 逗号位于字符串 () 的末尾。|\z

演示

1赞 dawg 10/23/2023 #3

Ruby 的 csv 解析器在这里很有用。

尝试:

ruby -r csv -e 'puts CSV.generate(**{:force_quotes => true}){|csv| 
    CSV.parse($<.read).map{|sa| 
        sa.map{|e| e.nil? ? e : e.gsub(/\R+/,"")}}.each{|row| csv<<row}}' file 

指纹:

"Module Statements","Module","Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal","A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.","6.4.5.a)",""
"Module Statements","","","A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement","6.4.5.b)",""
"Module Statements","","","The following objects shall have unique names within a module_def:port_name? scanRegister_name? dataRegister_name? one_hot_scan_group_name? scanMux_name? dataMux_name? clockMux_name? one_hot_data_group_name? logicSignal_name? alias_name","6.4.5.c)",""
"Module Statements","","","The following objects shall have unique names within a module_def:instance_name? scanInterface_name","6.4.5.d)",""
"Module Statements","","","An inputPort_name shall be one of the followingscanInPort_name? shiftEnPort_name? captureEnPort_name? updateEnPort_name? dataInPort_name? selectPort_name? resetPort_name? tmsPort_name? tckPort_name? clockPort_name? trstPort_name? addressPort_name? writeEnPort_name? readEnPort_name","6.4.5.e)",""
"Module Statements","","","An outputPort_name shall be one of the following:? scanOutPort_name? dataOutPort_name? toShiftEnPort_name? toUpdateEnPort_name? toCaptureEnPort_name? toSelectPort_name? toResetPort_name? toTckPort_name? toTmsPort_name? toClockPort_name? toTrstPort_name? toIRSelectPort_name","6.4.5.f)",""
"Module Statements","","","Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.","6.4.5.g)",""
"Module Statements","","","Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.","6.4.5.h)",""

如果不希望每个字段两边都用引号括起来,请将第一行更改为:

puts CSV.generate{|csv| # the rest the same...

如果您希望用空格替换(如您的示例所示),请将\r\ne.gsub(/\R+/,"")e.gsub(/\R+/," ")

评论

0赞 Cary Swoveland 10/24/2023
关于最后一句话,如果您不想要空格,但确实想杀死 CSV 输出文件中行尾的逗号(产生空字段),则可以替换为 .如果您确实需要空格并希望取消逗号,请替换为 .e.gsub(/\R+/,"")e.gsub(/\R+|,(?=\R+Module Statements,)/,"")e.gsub(/\R+/,"")e.gsub(/\R+|,(?=\R+Module Statements,)/) { |s| s == ',' ? '' : ' ' }