通过python将两个输入文件转换为某些字符串规则

convert two input files into certain string rules by python

提问人:Parine 提问时间:9/27/2023 最后编辑:Edo AkseParine 更新时间:9/27/2023 访问量:112

问:

假设有两个输入文件,如下所示:
(仅由
input1.txthstep3_*)

hstep3_num00 = a5;
hstep3_num01 = 3b;
hstep3_num02 = 4f;
hstep3_num03 = 27;

input2.txt(括号内的字母是一些随机字符,由,)

some random strings that are not 'hstep' form
... 
match hstep1_num00 = {eau,t,nb,v,d}; // MATCH
match hstep1_num01 = {c,bul,kv,e}; // MATCH
... 
match hstep3_num00 = {u_ku,b,ntv,q}; // MATCH
match hstep3_num01 = {qq,rask,cb_p}; // MATCH
match hstep3_num02 = {c,a,ha,w,ykl}; // MATCH
match hstep3_num03 = {p,gu,enb_q_b,z,d}; // MATCH
...
some random strings that are not 'hstep' form

我想做的是从 中整理出方程的所有左侧,并从 中匹配相应的括号和值。input1.txtinput2.txt

因此,最终输出 .txt 如下所示:output.txt

{u_ku,b,ntv,q}     = a5;
{qq,rask,cb_p}     = 3b;
{c,a,ha,w,ykl}     = 4f;
{p,gu,enb_q_b,z,d} = 27;

为了通过python做到这一点,我想过. 另外,由于括号内的字符数并不总是以行为单位,我认为我必须使用正则表达式来限制里面的范围,但它并没有像我预期的那样工作......
有人给我任何解决方案或指南吗?
readlines.split(){}

任何帮助将不胜感激。谢谢!

python-3.x 拆分 python-re readlines

评论

0赞 Muhammad Shamshad Aslam 9/27/2023
你的问题令人困惑。MATCH发生的标准是什么?你对为什么{u_ku,b,ntv,q}与a5匹配有解释吗?
0赞 Parine 9/27/2023
@MuhammadShamshadAslam 标准本身和“a5”本身的值在这里毫无意义!..我只是想将 input2.txt 的所有右侧与其在 input1.txt 中编写的值相匹配
0赞 treuss 9/27/2023
使用正则表达式和字典。显示您编写的代码以及您获得的错误消息或不正确的输出。
0赞 MisterMiyagi 9/27/2023
我不太清楚你到底在问什么。你是在问从文件中读取行,丢弃噪音吗?您是否在询问将行与相同的键匹配?您是否在询问使用正则表达式的失败尝试?别的?请考虑编辑您的问题,以专注于一个特定问题。match …hstep…
0赞 Parine 9/27/2023
@MisterMiyagi我问的是只匹配 hstep3_num00~03,然后从 input1.txt 中获取值。不是字符串 'match' 和 'MATCH'..这些“匹配”词可以以任何方式更改。

答:

2赞 mozway 9/27/2023 #1

您可以使用带有正则表达式的两个循环。第一个循环使用 re.findall 读取行并在匹配时构建字典,第二个循环使用 re.sub 执行替换:input2.txtinput1.txt

import re
with open('input2.txt') as f2:
    dic = dict(re.findall(fr'match ([^\s=]+) = ([^;]+); // MATCH', f2.read()))
# {'hstep1_num00': '{eau,t,nb,v,d}', 'hstep1_num01': '{c,bul,kv,e}',
#  'hstep3_num00': '{u_ku,b,ntv,q}', 'hstep3_num01': '{qq,rask,cb_p}',
#  'hstep3_num02': '{c,a,ha,w,ykl}', 'hstep3_num03': '{p,gu,enb_q_b,z,d}'}

with open('input1.txt') as f1, open('output1.txt', 'w') as f_out:
    for line in f1:
        f_out.write(re.sub(r'^\S+', lambda m: dic.get(m.group(), ''), line))

输出文件:

{u_ku,b,ntv,q} = a5;
{qq,rask,cb_p} = 3b;
{c,a,ha,w,ykl} = 4f;
{p,gu,enb_q_b,z,d} = 27;

正则表达式演示 1正则表达式演示 2

对准

如果需要对齐字符串,则可以修改上述方法。

固定宽度(或基于最大可能宽度):

import re

# same as previously
with open('input2.txt') as f2:
    dic = dict(re.findall(fr'match ([^\s=]+) = ([^;]+); // MATCH', f2.read()))

WIDTH = max([len(v) for k,v in dic.items() if k.startswith('hstep3_')])

with open('input1.txt') as f1, open('output1.txt', 'w') as f_out:
    for line in f1:
        f_out.write(re.sub(r'^\S+', lambda m: dic.get(m.group(), '').ljust(WIDTH), line))

动态宽度,基于最长的字符串:

import re

# same as previously
with open('input2.txt') as f2:
    dic = dict(re.findall(fr'match ([^\s=]+) = ([^;]+); // MATCH', f2.read()))

with open('input1.txt') as f1:
    WIDTH = max(len(dic.get(line.split(maxsplit=1)[0], '')) for line in f1)

with open('input1.txt') as f1, open('output1.txt', 'w') as f_out:
    for line in f1:
        f_out.write(re.sub(r'^\S+', lambda m: dic.get(m.group(), '').ljust(WIDTH), line))

输出:

{u_ku,b,ntv,q}     = a5;
{qq,rask,cb_p}     = 3b;
{c,a,ha,w,ykl}     = 4f;
{p,gu,enb_q_b,z,d} = 27;

评论

0赞 mozway 9/27/2023
@DarkKnight我错过了什么?
0赞 CtrlZ 9/27/2023
可变间距,使所有内容正确对齐
0赞 mozway 9/27/2023
这是@DarkKnight要求?没有明确提及,那么规则是什么?更长的字符串?固定最大值?
0赞 mozway 9/27/2023
无论如何,很容易处理,我更新了答案。
1赞 Edo Akse 9/27/2023 #2

下面的代码没有优化,但它是为了让 OP 更好地理解所涉及的过程

# read input1 and turn into dict
input1 = {}
with open("input1.txt") as infile:
    for line in infile.readlines():
        key, value = line.split(" = ")
        input1[key] = value

# read input 2 and store the maxlen value
input2 = []
maxlen = 0
with open("input2.txt") as infile:
    for line in infile.readlines():
        # only process lines that start with "match hstep3"
        if line.startswith("match hstep3"):
            key = line.split(" ")[1]
            value = line.split("= ")[1].split(";")[0]
            input2.append([key, value])
            # get the maxlength and store it for future use
            maxlen = max(maxlen, len(value))

# finally, produce the required output and write to file
with open("output.txt", "w") as outfile:
    for line in input2:
        key, value = line
        # use an f-string to produce the required output
        newline = f"{value:<{maxlen}} = {input1[key]}"
        outfile.write(newline)

output.txt文件内容:

{u_ku,b,ntv,q}     = a5;
{qq,rask,cb_p}     = 3b;
{c,a,ha,w,ykl}     = 4f;
{p,gu,enb_q_b,z,d} = 27;
1赞 Muhammad Shamshad Aslam 9/27/2023 #3

如果您的数据采用您提到的格式或接近它的格式,那么这应该有效。

result_2_dict = {}
result_1_dict = {}
file_2_list= []
file_1_list = []

with open('file2.txt', 'r') as file:
    for line in file:
        parts = line.split('=')
        file_2_list.append(parts)

for item in file_2_list:
    if "h" in item[0]:
        result_2_dict[item[0].strip("match").strip() ] = item[1].strip().split(" ")[0].strip(";")


with open('file1.txt', 'r') as file:
    for line in file:
        parts = line.split('=')
        file_1_list.append(parts)

for item in file_1_list:
    if "h" in item[0]:
        result_1_dict[item[0].strip()] = item[1].strip().strip(";")


matches_values = {}

for key, value in result_2_dict.items():
    if key in result_1_dict:
        matches_values[value] = result_1_dict[key]


    
for key, value in matches_values.items():
    print(f"{key} = {value}")