如何从数组中解析和过滤掉子字符串中的唯一名称?

How can I parse and filter out unique names in a substring from within an array?

提问人:nightmare637 提问时间:8/5/2022 最后编辑:nightmare637 更新时间:8/7/2022 访问量:79

问:

我有一个包含名称的字符串数组,例如以下内容(在 psuedocode 中):

string nameArray[7]
nameArray[0] = "JohnJacob"
nameArray[1] = "George"
nameArray[2] = "JacobBill"
nameArray[3] = "GeorgeBill"
nameArray[4] = "PatZach"
nameArray[5] = "BobJacob"
nameArray[6] = "MichaelScottChristine"

我想使这个数组没有出现不止一次的名称,甚至有一个子字符串。如果两个名字粘在一起也没关系;只要在数组的任何其他元素中都找不到任何子字符串。一个可能的结果可能如下所示:

nameArrayUpdated[0] = "John"
nameArrayUpdated[1] = "Jacob"
nameArrayUpdated[2] = "George"
nameArrayUpdated[3] = "Bill"
nameArrayUpdated[4] = "PatZach"
nameArrayUpdated[5] = "Bob"
nameArrayUpdated[6] = "MichaelScottChristine"

我最初的方法是遍历数组,并检查是否有任何元素是另一个元素的子字符串,然后删除该子字符串。像这样的东西(同样,伪代码):

for (i = 0; i < arrLength; i++)
{
   str tmp = nameArray[i];
   for (j = 0; j < arrLength; j++)
   {
      if (i != j)
      {
         if (nameArray[j].ContainsSubstring(tmp) == true)
            nameArray[j].RemoveSubstring(tmp)
      }

   }
}

但由于几个原因,这是有问题的。首先,它需要未知数量的迭代才能通过数组。例如,“JohnJacob”不会在第一次迭代时分离,而是在第二次迭代时分离。其次,像“JacobBill”和“BobJacob”这样的字符串本身就被认为是唯一的,而不是其他任何东西的子字符串......尽管雅各布包含在这两个字符串中。

有什么实用的方法可以区分出这些名称吗?

数组 与语言无关

评论

1赞 ALeXceL 8/5/2022
严格来说,这不是VBA代码。因此,最好的办法是将 [Arrays] 标签添加到问题中。
1赞 nightmare637 8/5/2022
非常感谢您的建议!我删除了 VBA 标记,并添加了数组和与语言无关的标记。感谢您的有用反馈!
0赞 FaneDuru 8/6/2022
我想到了一个VBA解决方案,但我希望您确认我理解您的问题的方式是正确的: 1.“名称”是以字母开头的字符串。2.应通过拆分上面定义的“名称”中显示的数组来获得更大的数组。3.如果在原始数组中找不到字符串,或者在较大的数组中只找到一次,则必须按原样使用(George,PatZach,MichaelScottChristine)。4. 最终数组由较大数组中的唯一名称组成,加上不拆分的名称,但也检查较大数组中没有重复项(“George”)。UpperCase
0赞 FaneDuru 8/6/2022
以上解释的理解正确吗?
0赞 markp-fuso 8/7/2022
您只对以大写字母开头的子字符串感兴趣,还是也考虑作为重复字符串(例如,和)?如果一个子字符串出现在多个数组元素中,你如何确定优先级(通过数组索引、整个数组元素的字母顺序、其他什么?)?如果数组元素的子字符串已被选择/使用(例如,)或重复(例如,'nameArrayUpdated[8] = “JohnJacob”)该怎么办......换句话说,数组元素的所有子字符串都出现在 3+ 数组元素中?obBillBobBillJacobBillynameArrayUpdated[7] = "JohnGeorge"

答:

0赞 FaneDuru 8/6/2022 #1

你没有回答我的澄清问题......

请复制 Excel 标准模块中的下一个代码并运行第一个 ()。该代码假设我在评论中表达的理解是正确的:SubextractNamesArr

Sub extractNamesArr()
  Dim arrN, arrFin, arr, mtch, strRepl As String, i As Long, j As Long, boolNot As Boolean
  arrN = Split("JohnJacob,George,JacobBill,GeorgeBill,PatZach,BobJacob,MichaelScottChristine", ",")
  
  arr = extractNB(arrN) 'create an array of twp dictionaries
                                        '1. above array elements (as keys) and all extracted names (as item) per each key
                                        '2. all unique extracted names, as dictionary keys
  Debug.Print Join(arr(0).Keys, "|") ' just to visually see the dictionary keys
  arrFin = arr(1).Keys               'place in an array the unique extracted names
  
  Dim key, k
  For Each key In arr(0).Keys        'iterate between the dictionary keys
        For i = 0 To UBound(arr(0)(key))
            For Each k In arr(0).Keys
                If key <> k Then
                    If InStr(1, k, arr(0)(key)(i), vbBinaryCompare) > 0 Then
                        boolNot = True: Exit For
                    End If
                End If
            Next k
            If boolNot Then Exit For
        Next i
        If Not boolNot Then 'none of the names belonging to the respective kay, NOT FOUND
            For j = 0 To UBound(arr(0)(key))
                mtch = Application.match(arr(0)(key)(j), arrFin, 0)
                If j = UBound(arr(0)(key)) Then
                      arrFin(mtch - 1) = key
                Else
                    strRepl = arrFin(mtch - 1) & "@#$%": arrFin(mtch - 1) = strRepl
                    arrFin = filter(arrFin, strRepl, False)
                End If
            Next j
        End If
        boolNot = False
  Next key

  Debug.Print Join(arrFin, "|") 'just to see the final returned array (unique names)
End Sub

Function extractNB(arr) As Variant
   Dim arrInt, k As Long, El, i As Long, frstEl As Long
   Dim dict As Object, dict1 As Object
   
   Set dict = CreateObject("Scripting.Dictionary")  'dictionary to keep all initial composed array elements (as key) and its split names (as item)
   Set dict1 = CreateObject("Scripting.Dictionary") 'dictionary to keep UNIQUE  extracted names
   
   frstEl = 1                     'first letter to build the first name string
   For Each El In arr             'iterate between the array elements
        ReDim arrInt(Len(El))     'ReDim the intermediary array to the string number of digits
        For i = 2 To Len(El)      'iterate between string characters, starting from the second one:
            If isUpper(Mid(El, i, 1)) Then                        'if the character is Upper case:
               arrInt(k) = Mid(El, frstEl, i - frstEl): k = k + 1 'place the Name inthe intermediary dictionary item array
               dict1(arrInt(k - 1)) = vbNullString                'load the dictionary to keep all UNIQUE extracted names
               frstEl = i          'adapt the first character number, to extract the next name
            End If
        Next i
        If frstEl < i - 1 Then     'extract the last name:
            arrInt(k) = Mid(El, frstEl, Len(El) - frstEl + 1): k = k + 1
            dict1(arrInt(k - 1)) = vbNullString                           'load the dictionary keeping all UNIQUE extracted names
        End If
        ReDim Preserve arrInt(k - 1)    'keep only the loaded array elements
        dict.Add El, arrInt             'create the dictionary key and its array item
        frstEl = 1: Erase arrInt: k = 0 'reinitialize the necessary variables
   Next

   extractNB = Array(dict, dict1)  'return an array containing the two dictionaries
End Function

Function isUpper(mstr As String) As Boolean 'Check if the  character is Upper case
    Select Case Asc(mstr)
            Case 65 To 90
                isUpper = True
            Case 97 To 122
                isUpper = False
            Case Else
                isUpper = False
    End Select
End Function

请在测试后发送一些反馈。如果有什么不清楚的地方,请随时要求澄清。

我尝试注释所有不容易理解的代码行......