提问人:kshitij kulshreshtha 提问时间:10/23/2023 最后编辑:ddakshitij kulshreshtha 更新时间:10/24/2023 访问量:120
Perl 脚本将两个模式之间的行合并为一行并打印整个文件
Perl script to merge lines into a single line between two patterns and print the entire file
问:
我有一个CSV文件,如下所示:
罚款
Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def:
port_name
? scanRegister_name
? dataRegister_name
? one_hot_scan_group_name
? scanMux_name
? dataMux_name
? clockMux_name
? one_hot_data_group_name
? logicSignal_name
? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def:
instance_name
? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following
scanInPort_name
? shiftEnPort_name
? captureEnPort_name
? updateEnPort_name
? dataInPort_name
? selectPort_name
? resetPort_name
? tmsPort_name
? tckPort_name
? clockPort_name
? trstPort_name
? addressPort_name
? writeEnPort_name
? readEnPort_name
",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name
? dataOutPort_name
? toShiftEnPort_name
? toUpdateEnPort_name
? toCaptureEnPort_name
? toSelectPort_name
? toResetPort_name
? toTckPort_name
? toTmsPort_name
? toClockPort_name
? toTrstPort_name
? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),
输出文件:搜索模式 模式 1:模块语句 模式 2:6。
这是所需的结果 - 我们需要的输出文件:
输出文件 ####################################################################
Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name
? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name
? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name
? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name
? writeEnPort_name ? readEnPort_name ",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following:
? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ?
toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name
? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name
? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),
我在下面创建了Perl脚本,但它没有给我预期的结果。
Perl 脚本
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use feature 'say';
my @collect;
my $file = $ARGV[0] or die;
open(my $DATA, '<', $file) or die;
while (<$DATA>) {
chomp;
# If we're between our markers...
if (/Module Statements/ .. /6.4*/) {
# At the start marker, empty the array
if (/Module Statements/) {
@collect = ();
# At the end marker, print the array
} elsif (/6.4*/) {
say join ' ', @collect;
# Otherwise, push the line onto the array
} else {
push @collect, $_;
foreach my $m (@collect) {
print $m;
}
}
# Otherwise, just print the line
} else {
say;
}
}
谢谢和问候 克希蒂·库尔什雷什塔
答:
您的数据最终格式为 csv。由于没有直接的正则表达式解决方案来解决它,我建议您:
- 以包含 6 列的 csv (input.csv) 形式打开它,
- 然后拆下进料管,
- 将结果写入 csv (output.csv)
这是一个代码,它做到了这一点,并使用这个微小的正则表达式来替换 Windows 或 Unix 格式的馈送行:\r?\n
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# Specify the CSV input file and separator
my $csv_input_file = "input.csv";
my $separator = ",";
# Specify the CSV output file
my $csv_output_file = "output.csv";
# Function to read CSV with 6 columns
sub read_csv {
my ($file, $separator) = @_;
my $csv = Text::CSV->new({
sep_char => $separator,
binary => 1,
eol => "\n",
auto_diag => 1,
strict => 1, # Enforce 6 columns per row
});
open my $fh, '<', $file or die "Could not open '$file': $!";
my @rows;
while (my $row = $csv->getline($fh)) {
# Check if the row has exactly 6 columns
if (@$row != 6) {
die "Error: Row does not have 6 columns in file '$file': @$row";
}
push @rows, $row;
}
close $fh or die "Error closing '$file': $!";
return \@rows;
}
# Function to remove line feeds
sub remove_line_feeds {
my ($data) = @_;
for my $row (@$data) {
for my $field (@$row) {
$field =~ s/\r?\n/ /g;
}
}
}
# Function to write CSV
sub write_csv {
my ($file, $data, $separator) = @_;
my $csv = Text::CSV->new({
sep_char => $separator,
binary => 1,
eol => "\n",
});
open my $fh, '>', $file or die "Could not create '$file': $!";
foreach my $row (@$data) {
$csv->print($fh, $row);
}
close $fh or die "Error closing '$file': $!";
}
# Read the CSV data
my $csv_data = read_csv($csv_input_file, $separator);
# Remove line feeds from the CSV data
remove_line_feeds($csv_data);
# Write the modified CSV data to the output file
write_csv($csv_output_file, $csv_data, $separator);
print "Modified CSV data saved to '$csv_output_file'.\n";
输出.csv:
"Module Statements",Module,"Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal","A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.",6.4.5.a),
"Module Statements",,,"A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement",6.4.5.b),
"Module Statements",,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name ? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
"Module Statements",,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
"Module Statements",,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name ? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name ? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name ? writeEnPort_name ? readEnPort_name ",6.4.5.e),
"Module Statements",,,"An outputPort_name shall be one of the following: ? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ? toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name ? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name ? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
"Module Statements",,,"Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.",6.4.5.g),
"Module Statements",,,"Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.",6.4.5.h),
我希望输出.csv中的额外“对您来说是可以接受的。以及只有一行的格式。
您似乎希望删除除前面的终止符之外的所有行终止符。为此,可以将以下正则表达式的匹配项替换为空格。"Module Statements,"
\R(?!Module Statements,)
如链接所示,这些替换生成以下字符串。
Module Statements,Module,Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal,A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.,6.4.5.a),
Module Statements,,,A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement,6.4.5.b),
Module Statements,,,"The following objects shall have unique names within a module_def: port_name ? scanRegister_name ? dataRegister_name ? one_hot_scan_group_name ? scanMux_name ? dataMux_name ? clockMux_name ? one_hot_data_group_name ? logicSignal_name ? alias_name",6.4.5.c),
Module Statements,,,"The following objects shall have unique names within a module_def: instance_name ? scanInterface_name",6.4.5.d),
Module Statements,,,"An inputPort_name shall be one of the following scanInPort_name ? shiftEnPort_name ? captureEnPort_name ? updateEnPort_name ? dataInPort_name ? selectPort_name ? resetPort_name ? tmsPort_name ? tckPort_name ? clockPort_name ? trstPort_name ? addressPort_name ? writeEnPort_name ? readEnPort_name ",6.4.5.e),
Module Statements,,,"An outputPort_name shall be one of the following: ? scanOutPort_name ? dataOutPort_name ? toShiftEnPort_name ? toUpdateEnPort_name ? toCaptureEnPort_name ? toSelectPort_name ? toResetPort_name ? toTckPort_name ? toTmsPort_name ? toClockPort_name ? toTrstPort_name ? toIRSelectPort_name",6.4.5.f),
Module Statements,,,Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.g),
Module Statements,,,Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.,6.4.5.h),
正则表达式为“匹配行终止符 (),前提是它后面没有”模块语句”。 表示否定展望,这是必须满足的条件,但不是返回的匹配项的一部分。\R
(?!...)
以下是 perl 中令牌的解释。\R
生成的字符串的每一行都以逗号结尾。如果在将字符串写入文件(以正常方式)时,当文件作为 CSV 文件读取时,行终止逗号表示不需要的空字段,则可以通过将匹配项替换为空字符串 () 来去除这些逗号。,(?=\R|\z)
''
(?=\R|\z)
是一个积极的展望,它断言逗号的匹配必须跟一个行终止符,或者 () 逗号位于字符串 () 的末尾。|
\z
Ruby 的 csv 解析器在这里很有用。
尝试:
ruby -r csv -e 'puts CSV.generate(**{:force_quotes => true}){|csv|
CSV.parse($<.read).map{|sa|
sa.map{|e| e.nil? ? e : e.gsub(/\R+/,"")}}.each{|row| csv<<row}}' file
指纹:
"Module Statements","Module","Collection of primitive building blocks and instances of other modules that comprise a network and or instrument. Two types handoff or internal","A redefinition of a module_def shall overwrite a previously defined module_def having the same module_name within the same nameSpace_name.","6.4.5.a)",""
"Module Statements","","","A module_def shall have at least one port_def unless it is the top module and contains an AccessLink statement","6.4.5.b)",""
"Module Statements","","","The following objects shall have unique names within a module_def:port_name? scanRegister_name? dataRegister_name? one_hot_scan_group_name? scanMux_name? dataMux_name? clockMux_name? one_hot_data_group_name? logicSignal_name? alias_name","6.4.5.c)",""
"Module Statements","","","The following objects shall have unique names within a module_def:instance_name? scanInterface_name","6.4.5.d)",""
"Module Statements","","","An inputPort_name shall be one of the followingscanInPort_name? shiftEnPort_name? captureEnPort_name? updateEnPort_name? dataInPort_name? selectPort_name? resetPort_name? tmsPort_name? tckPort_name? clockPort_name? trstPort_name? addressPort_name? writeEnPort_name? readEnPort_name","6.4.5.e)",""
"Module Statements","","","An outputPort_name shall be one of the following:? scanOutPort_name? dataOutPort_name? toShiftEnPort_name? toUpdateEnPort_name? toCaptureEnPort_name? toSelectPort_name? toResetPort_name? toTckPort_name? toTmsPort_name? toClockPort_name? toTrstPort_name? toIRSelectPort_name","6.4.5.f)",""
"Module Statements","","","Multiple inputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.","6.4.5.g)",""
"Module Statements","","","Multiple outputPort_name port functions that are of type vector_id may share the same SCALAR_ID as long as the indexes form a contiguous range and every index range is of the same either ascending or descending type.","6.4.5.h)",""
如果不希望每个字段两边都用引号括起来,请将第一行更改为:
puts CSV.generate{|csv| # the rest the same...
如果您希望用空格替换(如您的示例所示),请将\r\n
e.gsub(/\R+/,"")
e.gsub(/\R+/," ")
评论
e.gsub(/\R+/,"")
e.gsub(/\R+|,(?=\R+Module Statements,)/,"")
e.gsub(/\R+/,"")
e.gsub(/\R+|,(?=\R+Module Statements,)/) { |s| s == ',' ? '' : ' ' }
评论
"? one_hot_data_group_name ?..."
"Module Statements,"