正则表达式查找逗号分隔符不在引号中,也不在括号中

Regex find comma delimiters not in quotes AND not in parenthesis

提问人:Ben 提问时间:8/9/2016 更新时间:4/2/2021 访问量:768

问:

我的最终目标是开发一个函数来将 Access IIF() 语句转换为 T-SQL (2008) CASE WHEN 语句。我已经有一个 VBA 例程,可以找到 IIF 和匹配的右括号,即使它在引号之外。它是递归的,可以在不使用正则表达式的情况下查找嵌套的 IIF() 语句。当我缩小到 IIF 括号内的文本时,我需要确定分隔三个参数的两个逗号分隔符。当括号在引号内时,我遇到了麻烦。如何设置正则表达式以在处理表达式的其余部分之前忽略引号中的任何内容?

我正在尝试创建一个表达式组,该组将查找单引号内的任何内容和括号内的任何内容,然后排除与该组匹配的任何内容,并找到逗号。(如果我说得不正确,请原谅我,因为“捕获组”和“非捕获组”有时会给我带来与我期望相反的结果)。

请注意,此解决方案必须与 VBScript 正则表达式支持一起使用,这与 JavaScript 风格基本相同。

condition、true、false < -- 这是我的 IIF 解析函数在尝试拆分为 3 个部分之前返回的字符串。

到目前为止,这是我拼凑的表达方式:

,(?=([^']*'[^']*')*(?![^']*'))(?![^(]*[)])

这适用于此:

a=1, nz(b,0), Left('xy,z',2)

但这些线路更具挑战性。我找不到适用于所有这些的表达方式。

a=1, '1st)', '(2nd)' 
left(right(a,5),1)='b', '1st)', '(2nd)'
a=1, Left('a,bc',1) , 'xy,z'

这是我一直在研究的 Regex101:

https://regex101.com/r/qH0wD8/2

JavaScript 则表达 VBScript 引用括

评论

2赞 Wiktor Stribiżew 8/9/2016
这是一个骇人听闻的解决方法。您需要为此编写解析器。,(?=([^']*'[^']*')*(?![^']*'))(?![^(]*[)])
0赞 Ben 2/8/2018
我不确定我是否正走在一条黑暗的道路上。我很欣赏像您这样有名望的人的反馈。我最终编写了一个解析器来将嵌套的 IIF() 语句转换为 T-SQL CASE WHEN 语句。
0赞 Wiktor Stribiżew 2/8/2018
请在此处发布代码以供将来的访问者使用。
0赞 Ben 2/8/2018
我很乐意分享,但在哪里呢?注释只允许 600 个字符。我完成的函数IIF到CASE的转换函数是10k。它是用 VBA 编写的。
0赞 Wiktor Stribiżew 2/8/2018
:)作为答案发布,而不是作为评论。您可以回答自己的问题。

答:

0赞 Ben 2/8/2018 #1

答案是,我试图做的事情不适合正则表达式。相反,我通过编写一个处理嵌套括号、匹配引号和杂项逗号的解析器来解决这个问题。

我希望你能找到这个答案,因为你需要一个函数将 IIF() 语句转换为 CASE WHEN,而不是我复杂的正则表达式查询。此函数的主要用例是将 Access SQL 转换为 SQL Server 2008 及更早版本的 T-SQL。SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 2012SQL Server 201您可以在任何 VBA 编辑器中使用此 VBA 函数。如果 Access 不方便,则可以使用 Excel。

下面是一个 VBA 函数,用于将 Access IIF() 语句转换为 T-SQL CASE WHEN 语句

Public Function ReplaceIIFwithCASE(ByVal strInput As String) As String
' Parse the passed string and find the first "IIF(" and parse it into
' a standard T-SQL CASE WHEN statement.  If a nested IIF is found,
' recurse and call this function again.
'
' Ben Sacherich - May 2016-Feb 2017
'
' This supports:
'   IIF() embedded anywhere in the input string.
'   Nested IIF() statements.
'   The input string containing multiple IIF() statements on the same level.
'   Strings between open and close IIF parenthesis that contains literal commas or commas for other functions.
'       Example:  IIF(a=1, nz(b,0) , Left('xy,z',2))
'
' Be aware:
'   This does not optimize the CASE statement in places where a nested IIF could
'     be written as a single CASE statement.
'   It will fail if text inside IIF() containes the pipe character |. If you
'     need to process statements with this character, modify this routine
'     to use another temporary character for |.
'
' Try these in the Immediate window:
'   ? ReplaceIIFwithCASE("IIF(a=1, nz(b,0) , Left('xy,z',2))")
'   ? ReplaceIIFwithCASE("IIf(x='one',IIf(Abs(Z)=1,2,3),'three')")
'   ? ReplaceIIFwithCASE("IIF(a=1,'1st)', '2nd)')")
'   ? ReplaceIIFwithCASE("SELECT Name, IIF(Gender='M', 'Boy', 'Girl') FROM Students")
'
' How this works:
'   Find "IIF(" in the passed string.  Return original string if not found.
'   Search for the matching closing parenthesis.
'   When the match is found, recurse and make sure an "IIF(" is not nested.
'   After recursing, replace the IIF with a CASE statement.
'       - Once I find the inner part of an IIF this will use the Split function
'         to delimit by commas "," into an array.
'       - Then it looks at each array element. If it contains an odd number of
'         single or double quote characters or different number of opening and
'         closing parenthesis, it combines the array element part with the next
'         part and tests again.
'       - When there are matched single/double quotes and equivalent number of
'         parenthesis it holds that part and appends the "|" character.  This
'         means that it has identified one of the 3 parameters that is passed
'         to the IIF function.
'       - Then it splits the string by the "|" character into three pieces
'         and builds the CASE statement.
'   Continue searching the passed string for another occurrence of "IIF(" (not nested).

    Dim lngFuncStart    As Long
    Dim lngPosition     As Long
    Dim intStack        As Integer
    Dim strFunction     As String
    Dim strChar         As String
    Dim strQuoteChar    As String
    Dim bolInQuotes     As Boolean
    Dim strSplit()      As String
    Dim ReturnValue     As String

    On Error GoTo ErrorHandler

    strFunction = "IIF("
    strQuoteChar = "'"      ' Define the style of quotes to look for and exclude.
    bolInQuotes = False     ' We are currently not inside quotes.

    lngFuncStart = InStr(1, strInput, strFunction, vbTextCompare)

    If lngFuncStart > 0 Then
        lngFuncStart = lngFuncStart + Len(strFunction)
        intStack = 1
        lngPosition = lngFuncStart

        Do While lngPosition <= Len(strInput)
        ' Use a WHILE loop instead of a FOR loop because the current and end positions will change inside the loop.

            strChar = Mid(strInput, lngPosition, 1)

            If strChar = strQuoteChar Then
                bolInQuotes = Not bolInQuotes
                ' Move on to the next character

            ElseIf bolInQuotes = False Then
                ' We are currently not inside quotes.

                Select Case strChar
                    Case ")"
                        ' Closing a group
                        intStack = intStack - 1
                    Case "("
                        ' Starting new group
                        intStack = intStack + 1
                End Select

                If intStack = 0 Then ' Found closing parenthesis.

                    ' See if there is a nested match.  ### Recursive ###
                    ReturnValue = ReplaceIIFwithCASE(Mid(strInput, lngFuncStart, lngPosition - lngFuncStart))

                    ' Begin parsing commas.
                    strSplit() = Split(ReturnValue, ",")

                    Dim strPart         As String
                    Dim strRebuilt      As String
                    Dim i               As Integer

                    strRebuilt = ""
                    If UBound(strSplit()) > 2 Then ' There are more than 2 commas.  Piece together the parts.

                        strPart = strSplit(0)
                        For i = 1 To UBound(strSplit)

                            If UBound(Split(strPart, "'")) Mod 2 = 0 _
                              And UBound(Split(strPart, """")) Mod 2 = 0 _
                              And UBound(Split(strPart, "(")) = UBound(Split(strPart, ")")) Then
                                ' Number of single quotes is Even or Zero (matched)
                                ' Number of double quotes is Even or Zero (matched)
                                ' Number of parenthesis is matched

                                ' Add the "|" symbol where the IIF should have commas.
                                strRebuilt = strRebuilt & "|" & strPart
                                strPart = strSplit(i)
                            Else
                                strPart = strPart & "," & strSplit(i)
                            End If
                        Next
                        ReturnValue = Mid(strRebuilt & "|" & strPart, 2)

                        strSplit() = Split(ReturnValue, "|")
                    End If

                    If UBound(strSplit) = 2 Then
                        ' IIF has 3 parameters and is the normal case.
                        '--- Replace the IIF statement with CASE WHEN ---
                        ' CASE statement help:  https://msdn.microsoft.com/en-us/library/ms181765.aspx
                        ReturnValue = "(CASE WHEN " & Trim(strSplit(0)) & " THEN " & Trim(strSplit(1)) & " ELSE " & Trim(strSplit(2)) & " END)"
                        'ReturnValue = "(CASE WHEN...)"
                        If Right(Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1), 2) = vbCrLf Then
                            ' Don't bother to add a CrLf
                        Else
                            ' Add a CrLf before the CASE statement to make identification easier.
                            ' Comment this out if you don't want it added.
                            ReturnValue = vbCrLf & ReturnValue
                        End If
                        strInput = Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1) & ReturnValue & Mid(strInput, lngPosition + 1)
                    Else
                        ' Something is wrong.  Return the original IIF statement.
                        ' Known issues:
                        '       Text inside IIF() contained pipe character |
                        '       Text contained unmatched parenthesis, maybe inside of a literal string like '1st)'
                        ReturnValue = "IIF(" & ReturnValue & ") /*### Unable to parse IIF() ###*/ "
                        strInput = Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1) & ReturnValue & Mid(strInput, lngPosition + 1)

                    End If

                    '--- Check to see if there is another function call following the one just addressed. ---
                    lngFuncStart = InStr(lngFuncStart + Len(ReturnValue) - Len(strFunction), strInput, strFunction, vbTextCompare)
                    If lngFuncStart > 0 Then
                        ' Another IIF function call is at the same level as the one just processed.
                        lngFuncStart = lngFuncStart + Len(strFunction)
                        intStack = 1
                        lngPosition = lngFuncStart
                    Else
                        ReturnValue = strInput
                        Exit Do
                    End If

                End If
            End If
            lngPosition = lngPosition + 1
        Loop

    Else
        ' Function not found in passed string.
        ReturnValue = strInput
    End If

    ReplaceIIFwithCASE = ReturnValue

    Exit Function

ErrorHandler:

    MsgBox "Error #" & Err.Number & " - " & Err.Description & vbCrLf & "in procedure ReplaceIIFwithCASE()" _
        & vbCrLf & "Input:  " & strInput, vbExclamation, "Error"

End Function
0赞 mkayaalp 4/2/2021 #2

您可以使用递归匹配:

(?>(?>\([^()]*(?R)?[^()]*\))|(?>'[^']*')|(?>[^()' ,]+))+

允许重复外部匹配。在里面,有三个选项。第一个匹配平衡:(?>...)+()

(?>\([^()]*(?R)?[^()]*\)

第二个匹配单引号之间的任何内容:'...'

(?>'[^']*')

第三个匹配除 、 逗号或空格之外的任何内容:()'

(?>[^()' ,]+)