使用 Bun FFI 的 C SDK 失败,但使用具有相同逻辑的 SDK 的常规 C 脚本成功

Utilizing C SDK from Bun FFI fails but regular C script utilizing the SDK with the same logic succeeds

提问人:CatDadCode 提问时间:11/11/2023 最后编辑:CatDadCode 更新时间:11/16/2023 访问量:120

问:

这个赏金已经结束了。这个问题的答案有资格获得 +500 声望赏金。赏金宽限期在 14 小时后结束。CatDadCode 希望引起人们对这个问题的更多关注

我有一个示例 C 脚本,该脚本使用 Microsoft 语音 C SDK 成功检测音频文件中的关键字。我对 C 有点陌生,所以我花了一点时间来制作这个工作示例(特别是因为他们没有记录他们的 C API,所以我不得不从一些 C++ 文档和直觉中推断)。

注意:我知道他们的语音 SDK 有一个 JS 版本。问题是他们没有在他们的 JS SDK 中实现设备上的关键字检测,当我打开一个关于它的问题时,他们建议我使用带有节点包装器的 C/C++ SDK。

#include "./speechsdk/include/c_api/speechapi_c.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CHECK_RESULT(result, message) \
    if ((result) != SPX_NOERROR) { \
        fprintf(stderr, message ": %lu\n", (result)); \
        exit(-1); \
    }

int main() {
    SPXAUDIOCONFIGHANDLE audioConfig;
    AZACHR audioConfigResult =
         audio_config_create_audio_input_from_wav_file_name(&audioConfig,
                                                                             "./file.wav");
    CHECK_RESULT(audioConfigResult, "Failed to create audio config.");
    printf("Created audio config.\n");

    SPXKEYWORDHANDLE keywordModel;
    AZACHR keywordModelResult = keyword_recognition_model_create_from_file(
         "./keyword_models/hey_bumblebee.table", &keywordModel);
    CHECK_RESULT(keywordModelResult, "Failed to create keyword model.");
    printf("Created keyword model.\n");

    SPXRECOHANDLE recognizer;
    AZACHR recognizerResult =
         recognizer_create_keyword_recognizer_from_audio_config(&recognizer,
                                                                                  audioConfig);
    CHECK_RESULT(recognizerResult, "Failed to create recognizer.");
    printf("Created recognizer.\n");

    SPXRESULTHANDLE resultHandle = NULL;
    AZACHR recognizeKeywordResult = recognizer_recognize_keyword_once(
         recognizer, keywordModel, &resultHandle);
    CHECK_RESULT(recognizeKeywordResult, "Failed to to start recognition.");

    Result_Reason reason;
    AZACHR reasonResult = result_get_reason(resultHandle, &reason);
    CHECK_RESULT(reasonResult, "Failed to get result reason.");

    if (reason == ResultReason_RecognizedKeyword) {
        char textBuffer[256];
        AZACHR textResult =
             result_get_text(resultHandle, textBuffer, sizeof(textBuffer));
        CHECK_RESULT(textResult, "Failed to get recognized text.");
        printf("Recognized: \"%s\"\n", textBuffer);
    } else if (reason == ResultReason_NoMatch) {
        Result_NoMatchReason noMatchReason;
        AZACHR noMatchReasonResult =
             result_get_no_match_reason(resultHandle, &noMatchReason);
        CHECK_RESULT(noMatchReasonResult, "Failed to get no match reason.");
        printf("No match. Reason: %d.\n", noMatchReason);
    } else if (reason == ResultReason_Canceled) {
        Result_CancellationReason cancellationReason;
        Result_CancellationErrorCode cancelationCode;
        AZACHR canceledReasonResult =
             result_get_reason_canceled(resultHandle, &cancellationReason);
        CHECK_RESULT(canceledReasonResult, "Failed to get canceled reason.");
        AZACHR canceledCodeResult =
             result_get_canceled_error_code(resultHandle, &cancelationCode);
        CHECK_RESULT(canceledCodeResult, "Failed to get canceled error code.");
        printf("Canceled. Reason: %d. Code: %d.\n", cancellationReason,
                 cancelationCode);
    } else {
        printf("Unknown.\n");
    }
    return 0;
}

SDK 方法返回一个状态编号,我通过CHECK_RESULT宏检查该状态号以确保 (aka )。我希望从每次调用中获得的实际数据将填充到我创建并传入的指针中。C 脚本似乎运行良好,并且确实准确地检测了音频文件中的关键字(如果存在)。0SPX_NOERROR

现在,我正在尝试做这件事,但使用 Bun 的 FFI 功能。代码如下:

import { dlopen, FFIType, CString, ptr } from "bun:ffi";

const cwd = process.cwd();

// Enum derived from the SDK: https://github.com/catdadcode/microsoft-speech-sdk/blob/main/include/c_api/speechapi_c_result.h#L11-L26
enum ResultReason {
    NoMatch = 0,
    Canceled = 1,
    RecognizingSpeech = 2,
    RecognizedSpeech = 3,
    RecognizingIntent = 4,
    RecognizedIntent = 5,
    TranslatingSpeech = 6,
    TranslatedSpeech = 7,
    SynthesizingAudio = 8,
    SynthesizingAudioComplete = 9,
    RecognizingKeyword = 10,
    RecognizedKeyword = 11,
    SynthesizingAudioStart = 12,
}

// Assuming the Microsoft Speech SDK shared library is available at a certain path
const speechSdkPath = `${cwd}/speechsdk/lib/x64/libMicrosoft.CognitiveServices.Speech.core.so`;

// Load the Speech SDK shared library
const speechSdk = dlopen(speechSdkPath, {
    audio_config_create_audio_input_from_wav_file_name: {
        args: [FFIType.cstring, FFIType.ptr],
        returns: FFIType.u64_fast,
    },
    keyword_recognition_model_create_from_file: {
        args: [FFIType.cstring, FFIType.ptr],
        returns: FFIType.u64_fast,
    },
    recognizer_create_keyword_recognizer_from_audio_config: {
        args: [FFIType.ptr, FFIType.ptr],
        returns: FFIType.u64_fast,
    },
    recognizer_recognize_keyword_once: {
        args: [FFIType.ptr, FFIType.ptr, FFIType.ptr],
        returns: FFIType.u64_fast,
    },
    result_get_reason: {
        args: [FFIType.ptr, FFIType.ptr],
        returns: FFIType.u64_fast,
    },
    result_get_text: {
        args: [FFIType.ptr, FFIType.ptr, FFIType.u32],
        returns: FFIType.u64_fast,
    },
});

const textEncoder = new TextEncoder();
const textDecoder = new TextDecoder();

const checkResult = (result: number | bigint) => {
    if (result !== 0) {
        throw new Error(`Error: ${result}`);
    }
};

// Replace these with the actual file paths
const audioFilePath = textEncoder.encode(`${cwd}/file.wav`);
const keywordModelFilePath = textEncoder.encode(
    `${cwd}/keyword_models/hey_bumblebee.table`
);

// Create audio config
console.log(
    `Creating audio config from file: ${textDecoder.decode(audioFilePath)}`
);
const audioConfig = ptr(new Uint8Array(8));
const audioConfigResult =
    speechSdk.symbols.audio_config_create_audio_input_from_wav_file_name(
        audioFilePath,
        audioConfig
    );
checkResult(audioConfigResult);
console.log("Created audio config.");

// Create keyword model
console.log(
    `Creating keyword model from file: ${textDecoder.decode(
        keywordModelFilePath
    )}`
);
const keywordModel = ptr(new Uint8Array(8));
const keywordModelResult =
    speechSdk.symbols.keyword_recognition_model_create_from_file(
        keywordModelFilePath,
        keywordModel
    );
checkResult(keywordModelResult);
console.log("Created keyword model.");

// Create recognizer
console.log("Creating recognizer...");
const recognizer = ptr(new Uint8Array(8));
const recognizerResult =
    speechSdk.symbols.recognizer_create_keyword_recognizer_from_audio_config(
        recognizer,
        audioConfig
    );
console.log();
checkResult(recognizerResult!);
console.log("Created recognizer.");

// // Start recognition
// console.log("Starting recognition...");
// const resultHandle = ptr(new Uint8Array(16));
// const recognizeKeywordResult =
//  speechSdk.symbols.recognizer_recognize_keyword_once(
//      recognizer,
//      keywordModel,
//      resultHandle
//  );
// checkResult(recognizeKeywordResult);
// console.log("Recognition finished.");
//
// // Get result reason
// console.log("Getting result reason...");
// const reason = ptr(new Uint8Array(8));
// const reasonResult = speechSdk.symbols.result_get_reason(resultHandle, reason);
// checkResult(reasonResult);
// console.log("Got result reason:", new CString(reason));
//
// // Check the reason and handle accordingly
// if (reason === ResultReason.RecognizedKeyword) {
//  // Assuming a buffer size, adjust as needed
//  const textBuffer = new Uint8Array(256);
//  const textResult = speechSdk.symbols.result_get_text(
//      resultHandle,
//      textBuffer,
//      textBuffer.length
//  );
//  checkResult(textResult);
//  console.log("Recognized:", new CString(ptr(textBuffer)));
// } else {
//  console.error("Recognition failed:", reason);
// }

你会注意到其中一半被注释掉了。这是因为它在注释掉的部分之前已经失败了,所以我注释掉了其余部分以保持简单,但我为了完整起见而将其包括在内。返回的代码最终是一个很大的数字,似乎与任何错误代码都不对应。创建 和 的前两个调用似乎成功了,并且确实给了我零。我不知道我是否在这里搞砸了符号类型映射还是什么。recognizerResultaudioConfigkeywordModel

Microsoft似乎在任何地方都没有带有 SDK 头文件的存储库(它们只是提供下载),所以我将它们放在一个存储库中,以便我可以链接到它,以防您在查看此问题时想引用 SDK 本身: https://github.com/catdadcode/microsoft-speech-sdk.zip

哦,还有,我使用以下命令构建 C 脚本:

gcc main.c -o main -I./speechsdk/include/c_api/ -L./speechsdk/lib/x64/ -lMicrosoft.CognitiveServices.Speech.core -Wl,-rpath=./speechsdk/lib/x64/ -g

以防万一。我不确定除了将实际的 SDK 库文件传递给 ..sodlopen

非常感谢这里的任何帮助,因为我几天💚来一直在用头撞这个。

javascript c 打字稿 ffi bun

评论

0赞 CatDadCode 11/11/2023
亲爱的亲密选民,这就是问题所在。我不知道如何进一步瘦身。我希望有人能看到从纯 C 到 Bun 的 FFI 的翻译存在问题,并可能发现我做错了什么。甚至帮助进一步提炼它,或者帮助我找出更好的调试方法来获取更多信息。任何事情都是有帮助的。

答:

4赞 CatDadCode 11/14/2023 #1

这里的问题是双重的。首先,Bun 的 FFI 提供的字符串需要以 nul 结尾。这可以通过连接一个归零的缓冲区来实现:

const nul = Buffer.from([0])
const buf = Buffer.concat([Buffer.from("hello"), nul])

或者你可以简单地使用 nul 转义字符:

 const buf = Buffer.from("hello\0")

十六进制零也是等价的:

const buf = Buffer.from("hello\x00")

第二个问题是在指针方面对 Bun 的 FFI API 的简单误用。C 中指针的值是它指向的内存空间的地址,只需引用指针变量即可在 C 中访问该地址。获取指针本身的地址就像在指针变量前面加上一个 & 符号一样简单。&

在 Bun 中,这更复杂。获取指针的值(指针指向的地址)需要我们从指针中读取字节。同时,引用指针本身的地址就像传入原始指针一样简单。在 Bun 中,类型只是一个 ,即 poitner 的地址。Pointernumber

这意味着在 C() 中传递原始指针变量就像传递指针的值一样,指针是指向的数据的地址。在 Bun () 中传递原始指针变量就像传递指针本身的地址一样(类似于在 C 中传递 with)。ptrptr&

为了在 Bun 中获取指针指向的地址,我们必须从存储指针变量的内存中读取字节 ()。下面是用 C 和 Bun 做同样事情的简单示例:read.ptr(ptr)

C

SOMEHANDLE ptr;
someMethod(&ptr);
someOtherMethod(ptr);

包子

import { dlopen, ptr, FFIType, read } from "bun:ffi";

const sdk = dlopen("path/to/sdk", {
  someMethod: FFIType.ptr,
  someOtherMethod: FFIType.ptr
});

const ptr = ptr(new UInt8Array(8));
sdk.someMethod(ptr);
sdk.someOtherMethod(read.ptr(ptr));

事实上,事情是人字拖的(在 C 中读取指针地址的额外语法与在 Bun 中读取指向的地址的额外语法)真的让我陷入了困境。希望这种故障能帮助其他陷入类似困境的人。