如何在 epub 文件中捕获第二章的前 3 段?

How do I capture the first 3 paragraphs from the second chapter in an epub file?

提问人:SirBT 提问时间:6/22/2023 更新时间:6/22/2023 访问量:58

问:

如何在 epub 文件中捕获第二章的前 3 段?

我正在使用 Nodejs 和以下模块。我的应用将 epub 文件从客户端发送到服务器端进行解析。epub.jsmulterfsepub-metadata

在与 相同的目录中有一个名为 的文件夹。uploadsserver.js

当我运行我的应用程序时,服务器端能够使用以下命令成功捕获书籍:Metadata

// Extract metadata using epub-metadata
const title = book.metadata.title;
console.log('EPUB Metadata:', book.metadata);

屈服:

EPUB Metadata: {
 description: '¡Aprende a utilizar el storytelling para ganar cualquier concurso! Con este libro, descubre cómo destacar entre la multitud y ganar el primer lugar en cualquier competición. ¡Sé el ganador!',
  language: 'es',
  creator: 'Zuan Ásvarez',
  creatorFileAs: 'Zuan Ásvarez',
  title: '¡Sé el ganador! Utiliza el poder del storytelling para destacar en cualquier concurso y ganar el primer lugar.',
  UUID: '6A5EE928-BDA4-4246-BF14-56587BD6B885',
  subject: 'marketing',
  'calibre:timestamp': '2023-03-31T06:42:00.358647',
  cover: 'cover'
}

我还能够使用以下方法成功捕获:Table of contents

// Find the table of contents
const tableOfContents = book.toc;
console.log('############ tableOfContents ##########: ', tableOfContents);

屈服:

############ tableOfContents ##########:  [
  {
    level: 0,
    order: 1,
    title: 'Capítulo 1: Introducción: ¿Por qué el storytelling es importante en los concursos?',
    href: 'tmp_0244159-4fb3f440-afd3-41f7-8ef2-ff4c6ff37440_W9DfRi.ch.fixed.fc.tidied.stylehacked.xfixed.sc_split_003.html#cap1',
    id: 'd2cb557e-f55f-4128-9abb-c266bdf5be9b'
  },
  {
    level: 0,
    order: 2,
    title: 'Capítulo 2: Identifica tu audiencia y los objetivos del concurso',
    href: 'tmp_0244159-4fb3f440-afd3-41f7-8ef2-ff4c6ff37440_W9DfRi.ch.fixed.fc.tidied.stylehacked.xfixed.sc_split_004.html#cap2',
    id: 'a4ac0fda-3bd5-45ab-b602-af38e603fb82'
  },
.
.
.

我真正感兴趣的是第二章的前三段。

在下面找到整个代码。

app.post('/captureBook', upload.single('file'), async (request, response) => {
  try {
    console.log('## You are in Capture book ###');
    // Access the file using request.file
    const file = request.file;

    if (!file) {
      throw new Error('No file uploaded');
    }

    // Adjust the file paths
    const uploadsFolderPath = path.join(__dirname, 'uploads');
    const epubFilePath = path.join(uploadsFolderPath, file.filename);
    const imagewebroot = '/images/'; // Adjust the prefix for image URLs as needed
    const chapterwebroot = '/chapters/'; // Adjust the prefix for chapter URLs as needed

    // Create an EPub instance
    const book = new EPub(epubFilePath, imagewebroot, chapterwebroot);

    // Open the EPUB file
    book.on('end', async function () {
      try {
        // Retrieve the book title
        const title = book.metadata.title;
        console.log('EPUB Metadata:', book.metadata);

        // Extract metadata using epub-metadata
        const epubMetadata = await EpubMetadata(epubFilePath);
        console.log('Extracted Metadata:', epubMetadata);

        // Find the table of contents
        const tableOfContents = book.toc;
        console.log('############ tableOfContents ##########: ', tableOfContents);

        // Loop through the table of contents
        for (let item of tableOfContents) {
          console.log('########## Table of Contents Item:', item);

          // Retrieve the item content
          const itemContentPath = path.join(uploadsFolderPath, item.href); // Adjust the path to include the uploads folder
          console.log('Loading item content:', itemContentPath);

          // Wrap the book.getChapter function in a promise
          const getItemContent = () => {
            return new Promise((resolve, reject) => {
              book.getChapter(itemContentPath, function (error, itemContent) {
                if (error) {
                  console.error('Failed to retrieve item content:', error);
                  reject(error);
                } else {
                  console.log('Item content loaded:', itemContent);
                  resolve(itemContent);
                }
              });
            });
          };

          try {
            const itemContent = await getItemContent();
            // Rest of the code...
          } catch (error) {
            console.error(error);
            response.status(500).json({ error: 'Failed to retrieve item content' });
            return; // Return to avoid further execution
          }
        }

        // Respond with success
        response.status(200).json({ message: 'Book capture completed successfully' });
      } catch (error) {
        console.error(error);
        response.status(500).json({ error: 'Failed to process the file' });
      }
    });

    // Handle any parsing errors
    book.on('error', function (error) {
      console.error(error);
      response.status(500).json({ error: 'Failed to process the file' });
    });

    // Load the EPUB file
    book.parse();
  } catch (error) {
    console.error(error);
    response.status(500).json({ error: 'Failed to process the file' });
  }
});

应该遍历目录并捕获段落的代码部分,如下所示,日志Failed to retrieve item content: Error: File not found

########## Table of Contents Item: {
  level: 0,
  order: 1,
  title: 'Capítulo 1: Introducción: ¿Por qué el storytelling es importante en los concursos?',
  href: 'tmp_0244159-4fb3f440-afd3-41f7-8ef2-ff4c6ff37440_W9DfRi.ch.fixed.fc.tidied.stylehacked.xfixed.sc_split_003.html#cap1',
  id: 'd2cb557e-f55f-4128-9abb-c266bdf5be9b'
}
Loading item content: /home/sirbt/Desktop/epubAI/epubAI/uploads/tmp_0244159-4fb3f440-afd3-41f7-8ef2-ff4c6ff37440_W9DfRi.ch.fixed.fc.tidied.stylehacked.xfixed.sc_split_003.html#cap1
Failed to retrieve item content: Error: File not found

如何修改代码以确保其正常工作?

JavaScript epub.js

评论

0赞 Peter Thoeny 6/22/2023
我不熟悉 epub。要调试,您是否检查了目录中的文件是否匹配。可能从路径中删除?/home/sirbt/Desktop/epubAI/epubAI/uploads/#cap1
0赞 SirBT 6/22/2023
@PeterThoeny 您好,感谢您的及时回答。文件肯定会上传到文件夹中。但是,该名称由随机字符组成,例如“c1a09581f88ec378ac131e9d17d82cd4”,这与 epub 文件名本身的名称截然不同。我无法删除uploads#cap1console.log('########## Table of Contents Item:', item);
1赞 Peter Thoeny 6/24/2023
我的观点是尝试删除调用之前的,因为它看起来会从文件系统加载临时文件,而不是通过 REST 调用。因此,将您的路径更改为:.但是,如果 是文件名,请尝试以下操作:#cap1book.getChapter()const itemContentPath = path.join(uploadsFolderPath, item.href).replace(/#\w+$/, '');item.idconst itemContentPath = path.join(uploadsFolderPath, item.id);

答: 暂无答案