Google Document AI Python 查询抛出“ValueError： Unknown field for ProcessRequest： document_type”...Base64 编码引发另一个错误

Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error

提问人：Hack-R 提问时间：6/28/2023 最后编辑：Hack-R 更新时间：6/29/2023 访问量：303

问：

我正在使用 OCR Google Document AI 处理器运行 Python 的示例查询。我的查询和这个示例查询之间的唯一区别：

process_document_sample(
  project_id="99999FAKE",
  location="us",
  processor_id="99999FAKE",
  file_path="/path/to/local/pdf"
)

是我使用的是 JPEG 而不是 PDF。它抛出：

ValueError：ProcessRequest 的未知字段：document_type

所以，我想我需要添加这个：

# Import the base64 encoding library.
import base64

# Pass the image data to an encoding function.
def encode_image(image):
    with open(image, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
    return encoded_string

所以，我试过：

# Function to encode the image as base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")
    return encoded_string

# File path to the JPEG image
image_path = "merged_images/fake.jpeg"

# Encode the image as base64
encoded_image = encode_image(image_path)

# Call the process_document_sample function
process_document_sample(
    project_id="99999FAKE",
    location="us",
    processor_id="99999FAKE",
    file_path=encoded_image
)

不幸的是，这会导致：

OSError：[Errno 63] 文件名太长：

我想这是因为您只需要 base64 来处理基于 JSON 的请求，但这仍然给我留下了原始错误。

python 谷歌云平台 cloud-document-ai

0赞 Lara19 6/28/2023

通过usig os.rename更改文件名怎么样？

0赞 Hack-R 6/28/2023

@Lara19谢谢，但文件名错误只是因为它认为完整的 base64 编码字符串是一个文件名。

0赞 Hack-R 6/28/2023

更新：我找到了使用不同方法的解决方法，但没有直接答案。

0赞 kiran mathew 6/28/2023

嗨，@Hack-R，我已经提供了答案。如果我的回答解决了您的问题，请考虑接受并投赞成票。如果没有，请告诉我，以便我改进我的答案。

0赞 Hack-R 6/29/2023

@kiranmathew谢谢。由于我已经解决了这个问题，我目前无法测试解决方案，但我已经投了赞成票，因为它看起来是正确的。

答：

2赞 kiran mathew 6/28/2023 #1

根据您的要求，您可以考虑以下代码：

from google.api_core.client_options import ClientOptions
from google.cloud import documentai


PROJECT_ID = "my-project"
LOCATION = "us"  
PROCESSOR_ID = "processorid"  
FILE_PATH = "sample_image.jpeg"
MIME_TYPE = "image/jpeg"

docai_client = documentai.DocumentProcessorServiceClient(
    client_options=ClientOptions(api_endpoint=f"{LOCATION}-documentai.googleapis.com")
)

RESOURCE_NAME = docai_client.processor_path(PROJECT_ID, LOCATION, PROCESSOR_ID)
with open(FILE_PATH, "rb") as image:
    image_content = image.read()

raw_document = documentai.RawDocument(content=image_content, mime_type=MIME_TYPE)
request = documentai.ProcessRequest(name=RESOURCE_NAME, raw_document=raw_document)
result = docai_client.process_document(request=request)
document_object = result.document
print("\n Document processing complete.\n")
print(f"Text: {document_object.text}")

结果

在上面的例子中，我给出了一个jpeg图像作为输入，并得到了预期的结果。根据谷歌云文档也支持jpeg格式。有关更多信息，您可以参考此链接。mime typeimage/jpegDocumentation AI

0赞 Holt Skinner 6/29/2023

这应该没问题。问题是您发送的是完整编码的 base64 字符串作为文件路径，而不是内容。有关发送处理请求的更多代码示例，请参阅文档。cloud.google.com/document-ai/docs/send-requestrawDocument

上一个：基于 Python 的 Google Cloud Functions 持续存在 CORS 问题，即使使用 Docs 示例也是如此

下一个：Rasa - 不存在实体的“from_entity”映射

Google Document AI Python 查询抛出“ValueError： Unknown field for ProcessRequest： document_type”...Base64 编码引发另一个错误

Google Document AI Python Query Throws "ValueError: Unknown field for ProcessRequest: document_type"... base64 encoding throws another error

评论

评论