从大于 5 条记录的 csv 文件中提取数据

Pulling data from csv files larger than 5 records

提问人:JHen 提问时间:8/24/2023 最后编辑:JHen 更新时间:8/26/2023 访问量:72

问:

有关更多上下文,请参阅我的第一篇文章:第一篇文章

我需要将数据从 csv 文件提取到现有的 excel 文件中。我对自定义函数和整个 excel 的更深层次机制知之甚少,因此在 GPT 3.5 的帮助下,我开发了一个自定义函数,用于从具有 2 条记录的测试 csv 文件中提取数据。

以下是 GPT 给我的功能,根据我得到的反馈进行了一些调整:

Function GetCSVCellValueFromRecord(csvFilePath As String, recordIndex As Long, targetColumnName As String) As Variant
    Dim csvContent As String
    Dim lines() As String
    Dim headers() As String
    Dim columnIndex As Long
    Dim i As Long
    
    ' Read the entire CSV file into a string
    Open ThisWorkbook.Path & "\" & csvFilePath For Input As #1
    csvContent = Input$(LOF(1), 1)
    Close #1
    
    ' Split the CSV content into lines
    lines = Split(csvContent, vbCrLf)
    
    ' Get the headers from the first line
    headers = Split(lines(0), ",")
    
    ' Find the column index of the target data
    columnIndex = -1
    For i = LBound(headers) To UBound(headers)
        If Trim(headers(i)) = targetColumnName Then
            columnIndex = i
            Exit For
        End If
    Next i
    
    ' Return an error if the column name is not found
    If columnIndex = -1 Then
        GetCSVCellValueFromRecord = CVErr(xlErrValue)
        Exit Function
    End If
    
    ' Check if the requested record index is within bounds
    If recordIndex >= 1 And recordIndex <= UBound(lines) Then
        Dim fields() As String
        fields = Split(lines(UBound(lines) - recordIndex), ",")  ' Remove the subtraction of 1 here
        If UBound(fields) >= columnIndex Then
            GetCSVCellValueFromRecord = Trim(fields(columnIndex))
            Exit Function
        End If
    End If

    
    ' Return "N/A" if the record or data is not found
    GetCSVCellValueFromRecord = "N/A"
End Function

我有一个包含所需 csv 文件名称的单元格,我的函数调用引用此单元格,以便使用 csv 中的信息更新所有相关字段。

函数调用示例:如果找不到记录,则返回“N/A”。=GetCSVCellValueFromRecord(F1, 1, "Comment")

该函数适用于包含 5 条记录的小型 csv 文件,但对于不存在的第 6 条记录,它返回指定列的名称,然后继续正确显示不存在的第 7 条到第 20 条记录的“N/A”。我不确定是什么原因导致了这个错误。当测试一个有 10 条记录的文件时,该函数会完全失败,并且所有字段都返回不正确的数据,并且应该存在的字段返回 N/A。

再一次,我不确定这个问题。请指教

csv 文件将最新的记录作为第一行,出于我的目的,我需要将最早的记录视为第一行

来自工作 csv 的数据:

Date,Time,Service provider,Client name,Client phone,Comment,Service category
24/8/2023,01:00 PM - 01:55 PM,Tim Robinson,Severus Mitsu,18765894838,"5.Checked only 1 mirror before moving off
",Manual General License
24/8/2023,10:15 AM - 11:10 AM,Tim Robinson,Severus Mitsu,18765894838,"4.Started the vehicle properly
",Manual General License
24/8/2023,09:20 AM - 10:15 AM,Tim Robinson,Severus Mitsu,18765894838,"3.Failed to disengage parking brake
",Manual General License
24/8/2023,08:25 AM - 09:20 AM,Tim Robinson,Severus Mitsu,18765894838,"2.Failed to engage seatbelt
",Manual General License
24/8/2023,07:30 AM - 08:25 AM,Tim Robinson,Severus Mitsu,18765894838,"1.Successfully Opened door
",Manual General License

来自损坏的 csv 的数据:

Date,Tim e,Service,"Service provider","Client name","Client phone",Comment,"Service category"
26-08-2023,"01:00 PM - 01:55 PM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"10. Excessive Speeding","Manual General License"
26-08-2023,"10:15 AM - 11:10 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"9.Ran the red lights","Manual General License"
26-08-2023,"09:20 AM - 10:15 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"8.Mastered Parallel Parking","Manual General License"
26-08-2023,"08:25 AM - 09:20 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"7.botched attempt at parallel parking","Manual General License"
26-08-2023,"07:30 AM - 08:25 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"6. Making good progress","Manual General License"
24-08-2023,"01:00 PM - 01:55 PM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"5.Checked only 1 mirror before moving off
","Manual General License"
24-08-2023,"10:15 AM - 11:10 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"4.Started the vehicle properly
","Manual General License"
24-08-2023,"09:20 AM - 10:15 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"3.Failed to disengage parking brake
","Manual General License"
24-08-2023,"08:25 AM - 09:20 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"2.Failed to engage seatbelt
","Manual General License"
24-08-2023,"07:30 AM - 08:25 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"1.Success fully Opened door
","Manual General License"


Excel VBA 函数 CSV

评论

0赞 Tim Williams 8/24/2023
值得一提的是(来自您之前的帖子)“csv 文件将最近的记录作为第一行,出于我的目的,我需要将最早的记录视为第一行”
0赞 JHen 8/24/2023
我会把它添加到帖子中。谢谢
1赞 Tim Williams 8/24/2023
最好在记事本(而不是Excel)中打开每个CSV,然后从那里复制和粘贴内容,这样我们就可以确定格式。Excel 在打开 CSV 文件时执行操作。
0赞 JHen 8/24/2023
好的,我已经编辑了帖子
0赞 Tim Williams 8/24/2023
某些带引号的字段值包含换行符。仅当您的文件在行之间和字段之间只有这些内容时,才在 vbCrLf 和逗号上拆分:如果字段值可能包含其中任何一个,则不起作用。

答:

1赞 Tim Williams 8/24/2023 #1

某些带引号的字段值包含换行符。仅当您的文件在行之间和字段之间只有这些内容时,才有效拆分逗号:如果字段值可能包含其中任何一个,则不起作用。vbCrLf

为了更清楚起见,这将显示换行符在文件中的位置(vbCrLf/vbLf/vbCr 中的任何一个):

    ff = FreeFile ' Read the entire CSV file into a string...
    Open ThisWorkBook.path & "\" & csvFilePath For Input As ff
    csvContent = Input$(LOF(ff), ff)
    Close ff
    
    csvContent = Replace(csvContent, vbCrLf, "<CrLf>")
    csvContent = Replace(csvContent, vbCr, "<Cr>")
    csvContent = Replace(csvContent, vbLf, "<Lf>")
    csvContent = Replace(csvContent, "<CrLf>", "<CrLf>" & vbLf)
    Debug.Print csvContent

某些记录的引号内有 CrLf - 这些不是分隔线的新行,而是嵌入在字段数据中的换行符。

Date,Tim e,Service,"Service provider","Client name","Client phone",Comment,"Service category"<CrLf>
26-08-2023,"01:00 PM - 01:55 PM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"10. Excessive Speeding","Manual General License"<CrLf>
26-08-2023,"10:15 AM - 11:10 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"9.Ran the red lights","Manual General License"<CrLf>
26-08-2023,"09:20 AM - 10:15 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"8.Mastered Parallel Parking","Manual General License"<CrLf>
26-08-2023,"08:25 AM - 09:20 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"7.botched attempt at parallel parking","Manual General License"<CrLf>
26-08-2023,"07:30 AM - 08:25 AM","Nissan Frontier #CO 026","Harold  Richards","Severus Mitsu",+18765894838,"6. Making good progress","Manual General License"<CrLf>
24-08-2023,"01:00 PM - 01:55 PM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"5.Checked only 1 mirror before moving off<CrLf>
","Manual General License"<CrLf>
24-08-2023,"10:15 AM - 11:10 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"4.Started the vehicle properly<CrLf>
","Manual General License"<CrLf>
24-08-2023,"09:20 AM - 10:15 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"3.Failed to disengage parking brake<CrLf>
","Manual General License"<CrLf>
24-08-2023,"08:25 AM - 09:20 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"2.Failed to engage seatbelt<CrLf>
","Manual General License"<CrLf>
24-08-2023,"07:30 AM - 08:25 AM","L200 Mitsubishi #794 AL","Tim Robinson","Severus Mitsu",+18765894838,"1.Success fully Opened door<CrLf>
","Manual General License"<CrLf>

编辑:这成功地从“问题”文件中读取内容:

Sub Tester()
    Debug.Print GetCSVCellValueFromRecord("Test2.csv", 1, "Comment")
    '>> 1.Success fully Opened door
    Debug.Print GetCSVCellValueFromRecord("Test2.csv", 2, "Comment")
    '>> 2.Failed to engage seatbelt
    Debug.Print GetCSVCellValueFromRecord("Test2.csv", 1, "Commentx")
    '>> ?header?
    Debug.Print GetCSVCellValueFromRecord("Test2.csv", 4, "Comment")
    '>> 4.Started the vehicle properly
    Debug.Print GetCSVCellValueFromRecord("Test2.csv", 30, "Comment")
    '>> ?rec num?
End Sub


Function GetCSVCellValueFromRecord(csvFilePath As String, recordIndex As Long, targetColumnName As String) As Variant
    Dim numLines As Long, lines As Collection
    Dim headers() As String, fields() As String
    Dim columnIndex As Long, i As Long
   
    Set lines = CSVToLines(ThisWorkBook.path & "\" & csvFilePath)
    numLines = lines.Count
    If numLines = 0 Then
        GetCSVCellValueFromRecord = "?no data?"
        Exit Function
    End If
    
    ' Find the column index of the target data
    headers = lines(1) ' Get the headers from the first line
    columnIndex = -1
    For i = LBound(headers) To UBound(headers)
        If Trim(headers(i)) = targetColumnName Then
            columnIndex = i
            Exit For
        End If
    Next i
    
    ' check the column name was found
    If columnIndex = -1 Then
        GetCSVCellValueFromRecord = "?header?"
    Else
        ' record index is within bounds?
        If recordIndex >= 1 And recordIndex <= lines.Count - 1 Then
            fields = lines(numLines - (recordIndex - 1))
            If UBound(fields) >= columnIndex Then
                GetCSVCellValueFromRecord = Trim(fields(columnIndex))
                Exit Function
            End If
        Else
            GetCSVCellValueFromRecord = "?rec num?"
        End If
    End If
End Function

'Return the content from a CSV file as a Collection of
'  arrays, one array per line
'Account for quoted fields containing commas, quotes, or newlines
Function CSVToLines(csvFilePath As String) As Collection

    Dim csvContent As String, line As String, cNext As String
    Dim c, col As New Collection
    Dim i As Long, ff As Integer, length As Long, inQ As Boolean
    
    ff = FreeFile ' Read the entire CSV file into a string...
    Open csvFilePath For Input As ff
    csvContent = Input$(LOF(ff), ff)
    Close ff
    
    csvContent = Replace(csvContent, vbCrLf, vbLf) & " " 'normalize newlines and add a space
    length = Len(csvContent) - 1                         'skip the added space when looping...
    inQ = False                                          'not yet inside a quoted field
    i = 1
    Do While i < length
        c = Mid(csvContent, i, 1)
        cNext = Mid(csvContent, i + 1, 1)
        If c = """" Then       'already in quotes?
            If cNext = """" Then
                line = line & c     'was a doubled-up quote - add one quote
                i = i + 1           'skip the next character
            Else
                inQ = Not inQ       'switching in/out of a quoted field
            End If
        Else
            If Not inQ Then
                'Not in a quoted field - decide how to handle the character
                Select Case c
                    Case vbLf   'not inside quotes, so this is the end of a line...
                        col.Add Split(line, Chr(0))  'split fields on chr(0)
                        line = ""
                    Case ",": line = line & Chr(0) 'sub comma for chr(0) for later splitting
                    Case Else: line = line & c
                End Select
            Else
                'In a quoted field: add each character unless a vblf
                '  change behaviour to suit your needs....
                Select Case c
                    Case vbLf 'do nothing?
                    Case Else: line = line & c
                End Select
            End If
        End If
        i = i + 1
    Loop
    If Len(line) > 0 Then col.Add Split(line, Chr(0)) 'add any remaining line
    Set CSVToLines = col
End Function

评论

0赞 Tim Williams 8/25/2023
编辑是否解决了问题?
0赞 JHen 8/26/2023
你是我的朋友,绝对是我的祝福。这非常有效。感谢所有的帮助!
0赞 Tim Williams 8/26/2023
很高兴它很有用。