如何在 Python 中使用计算数据的分隔符拆分字符串-解网

问：

这是我第一次在这里写，我不知道如何解决这个问题。

基本上，我想用多个分隔符拆分一个大字符串。分隔符不是逗号、制表符或类似的东西。此字符串由四位数字分隔;前两个描述它有什么样的数据，最后两个描述数据有多长，所以它是这样的：

PN1121 E 1st St0008纽约0102NY020510003

其中 PN 是地址，11 是它拥有的字符数，接下来是 00，表示城市，8 是它拥有的字符数;接下来是 01，表示状态，有 2 个字符;等等。

我的问题是找到一种方法来提取这些数据并将其拆分为逗号，如下所示：21 E 1st St， New York， NY 10003。不仅如此，还会有一些字符串没有的数据，所以它会是这样的：

PN1121 E 1st St0102NY020510003

在这些情况下，它应该替换为这样的空格：21 E 1st St， NY， 10003

我一直在为此苦苦挣扎。谁能帮我？我不会附加字符串，因为它太长了，但您可以获得以下数据：PN、00、01,...,40、41。

我已经在 Python 中尝试过了，但我不会撒谎;编程不是我的强项，我只懂一些非常基本的Python。你能通过 Python 教我吗？

Python 字符串拆分

def delimite(string):
    count = 0  # count stands for the delimeters. When count == 4, we get a complete delimeter
    length = 0  # length used to store the length of each data
    number = ""  #  number used to store the delimeters
    result = "" 
    for letter in string:
        if length == 0:  # if length == 0, it means we are in process of extracting the delimeters
            count = count + 1 
            number = number + letter  # store the delimeters
            if count == 4:  # if count == 4, it's time to get a delimeter
                length = int(number[2:])  # extract the last two digits
                number = ""  # empty number and count
                count = 0
        else:  # if length != 0， we are in process of extracting the data
            result = result + letter  # store the data one by one
            length = length - 1 
            if length == 0:  # length == 0 means we finishing storing current data, and we add a ", " 
                result = result + ", "
    result = result.rstrip(", ")  # using the rstrip method to remove the last ", "
    return result

def parse_data(input_string):
    data = []
    index = 0
    while index < len(input_string):
        length = int(input_string[index + 2:index + 4])
        index += 4
        current_data = input_string[index:index + length]
        data.append(current_data)
        index += length
    return data

input_str = "PN1121 E 1st St0008New York0102NY020510003"

parsed_data = parse_data(input_str)
print(parsed_data)

输出：

['21 E 1st St', 'New York', 'NY', '10003']

2赞 Serge Ballesta 11/15/2023 #3

当你面对一个看起来太难的问题时，试着把它分成更小的部分，然后一个接一个地冷静地解决它。

我们这里有什么？许多字符串由 id、length、data 序列组成，采用定义明确但非标准格式，您希望使用所有这些构建 csv。有趣的是，您知道可能的 ID 列表。

让我们一步一步来...：

给定一个简单的数据元素，提取其类型标识符、长度和数据：

 typid = s[0:2]        # typ is contained if first 2 characters
 length = int(s[2:4])  # data length is contained in next 2
 data = s[4:4+length]  # get it

没有比这更难的了......

给定一个完整的字符串，遍历其标记：

在继续之前，我们应该注意到上面的代码将从完整字符串中提取第一个标记，下一个标记将从位置 .我们可以通过遍历字符串来轻松构建（typid， data）列表：4 + length

 linedata=[]              # initialize an empty string
 pos = 0                  # start at the beginning
 while pos < len(s):      # and step over the string
     typid = s[pos:pos+2]          # typ is contained if first 2 characters
     length = int(s[pos+2:pos+4])  # data length is contained in next 2
     data = s[pos+4:pos+4+length]  # get it
     linedata.append((typid, data) # append what we have found to the list
     pos = pos + 4 + length        # and go the thee beginning of next token

从所有这些构建一个 csv 文件：

Python 是一个包含电池的工具。一旦你有了一系列键、值对，就可以将其转换为，csv 模块包含一个用于从 dicts 构建 csv 的类。当您知道可能的标识符时，您可以要求它将其写为 csv 标题行。我不会为这部分提供代码，因为你没有给出数据的来源，但这应该不会太难。dictDictWriter

你应该学到什么：

当您面临一个复杂的问题时，请尝试将其分成较小的部分。不仅您可能能够单独使用某些部分，而且您可以在这里提出更精确的问题，并期待详细而完整的答案。但不要忘记给出一般的背景......

上一个：如何将“\”替换为空字符串？

下一个：在 C 语言中对齐 printf 语句

如何在 Python 中使用计算数据的分隔符拆分字符串

How To Split Strings With Delimiters That Count Data In Python

评论

评论