提问人:Panka Bálint 提问时间:11/11/2023 更新时间:11/11/2023 访问量:59
VBA - 从扫描的 PDF 中获取文本并将其保存在 Excel 中
VBA - Get text from scanned PDF and save it in Excel
问:
我有一个非常具体的问题。我有一个用于从 PDF 文件中提取文本并将其保存在 Excel 中的代码。问题是由于文本阅读问题,它不适用于扫描的pdf文件。
我的代码执行以下操作:
- 打开 PDF
- 获取页面并突出显示页面中的文本
- 将其保存在变体中
- 运行上述变体并写入每个单词(这是基本代码,现在它只写下特定值)
- 关闭 PDF
我希望它也适用于扫描的 PDF。我认为问题在于它无法突出显示文本,因为它更像是保存为 PDF 的图片,而不是真正的书面 PDF。 这是我的代码(我也没有制作这个代码,但在互联网上找到了它):
Public Function Get_VIN_From_CoC(PDF_File As String, OnWhichPage As Integer) As String
'This procedure get the PDF data into excel by following way
'1.Open PDF file
'2.Looping through pages
'3.get the each PDF page data
Dim AC_PD As Acrobat.AcroPDDoc 'access pdf file
Dim AC_Hi As Acrobat.AcroHiliteList 'set selection word count
Dim AC_PG As Acrobat.AcroPDPage 'get the particular page
Dim AC_PGTxt As Acrobat.AcroPDTextSelect 'get the text of selection area
Dim Ct_Page As Long 'count pages in pdf file
Dim j As Long, K As Long 'looping variables
Dim T_Str As String
Dim Hld_Txt As Variant 'get PDF total text into array
Dim VIN As String
Set AC_PD = New Acrobat.AcroPDDoc
Set AC_Hi = New Acrobat.AcroHiliteList
'set maximum selection area of PDF page
AC_Hi.Add 0, 32767
With AC_PD
'open PDF file
.Open PDF_File
'get the number of pages of PDF file
Ct_Page = .GetNumPages
'if get pages is failed exit sub
If Ct_Page = -1 Then
MsgBox "Pages Cannot determine in PDF file '" & PDF_File & "'"
.Close
GoTo h_end
End If
T_Str = ""
'get the page
Set AC_PG = .AcquirePage(OnWhichPage)
'get the full page selection
Set AC_PGTxt = AC_PG.CreateWordHilite(AC_Hi)
'if text selected successfully get the all the text into T_Str string
If Not AC_PGTxt Is Nothing Then
With AC_PGTxt
For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
Next j
End With
End If
'get the PDF data into each sheet for each PDF page
'if text accessed successfully then split T_Str by VbCrLf
'and get into array Hld_Txt and looping through array and fill sheet with PDF data
If T_Str <> "" Then
Hld_Txt = Split(T_Str, vbCrLf)
For K = 0 To UBound(Hld_Txt)
T_Str = CStr(Hld_Txt(K))
If Left(T_Str, 1) = "=" Then T_Str = "'" & T_Str
MsgBox T_Str
If Right(T_Str, 6) = "(Kg) :" Then VIN = CStr(Hld_Txt(K + 1))
Next K
Else
'information if text not retrive from PDF page
MsgBox "No text found in page "
End If
.Close
End With
h_end:
Set AC_PGTxt = Nothing
Set AC_PG = Nothing
Set AC_Hi = Nothing
Set AC_PD = Nothing
Get_VIN_From_CoC = VIN
End Function
你能帮我解决这个问题吗?
答: 暂无答案
评论