HTML JPG PDF XML DOCX
  Product Family
PDF

Parse PDF for extraction Texts in Node.js SDK

API for parsing PDF documents to extract texts using server-side Node.js API.

Get Started

How to parse PDF documents for extraction Texts using Cloud Node.js SDK

For parse PDF documents to extraction Texts via Cloud Node.js SDK , we’ll use Aspose.PDF Cloud Node.js SDK This Cloud SDK assists Node.js programmers in developing cloud-based PDF creator, annotator, editor, converter and parser apps using Node.js programming language via Aspose.PDF REST API. Simply create an account at Aspose for Cloud and get your application information. Once you have the App SID & key, you are ready to give the Aspose.PDF Cloud Node.js SDK.

Package Manager Console Command


     
    npm install asposepdfcloud --save
     
     

Steps to parse PDF for extaction Texts using Node.js SDK

Aspose.PDF Cloud developers can easily parse PDF documents for extraction Texts. Developers need just a few lines of code.

  1. Create a new Configuration object with your Application Secret and Key
  2. Create an object to connect to the Cloud API
  3. Upload your document file
  4. Parse PDF documents for extraction Texts in cloud storage using getDocumentTextBoxFields function
  5. Checks the response and logs the result
  6. Download Text boxes info in JSON file locally if needed
 

This sample code shows parsing PDF document for extraction Texts


import fs from 'node:fs/promises';
import path from 'node:path';

export {ParseExportTextBoxes};

const pdfApi = new PdfApi(APP_SID, APP_KEY);

const ParseExportTextBoxes = {
    async uploadDocument (documentName, localFolder, tempFolder) {
        const fileNamePath = path.join(localFolder, documentName);
        const fileData = await fs.readFile(fileNamePath);
        const storagePath = path.join(tempFolder, documentName);
        await pdfApi.uploadFile(storagePath, fileData)
            .then(() => console.log("File: '" + documentName +"' successfully uploaded."));
    },
    
    async export(documentName, localFolder, remoteFolder) {
        if ( pdfApi ) {
            await uploadDocument(documentName, localFolder, remoteFolder);

            const response = await pdfApi.getDocumentTextBoxFields( documentName, null, remoteFolder );

            if (response.body.code == 200) {
                console.log("ParseExportTextBoxes(): TextBox Fileds successfully extracted!");
                
                var result = "[\n";
                await Promise.all(
                    response.body.fields.list.map(async (textbox) => {
                        const responseText = await pdfApi.getTextBoxField(documentName, textbox.fullName, null, remoteFolder)
                            .then(function(responseTextBox){
                                result += JSON.stringify(textbox) + ",\n\n";
                            });
                        })
                    );
                result += "]";

                const filePath = path.join(localFolder, "parsed_text_boxes_output.json");
                await fs.writeFile(filePath, result);
                console.log("Downloaded: " + filePath);   
            }
            else
                console.error("ParseExportTextBoxes(): Unexpected error!") 

        }
    }
};
 

Work with the Text parsing in PDF via Node.js SDK

By parsing PDF documents for texts extraction, you can modify the content of TextBox fields as needed. This maintains the position of the text in the documents while saving time and reducing manual work. Parse PDF documents to extraction texts with Aspose.PDF Cloud Node.js SDK.

With our Node.js SDK you can

  • Add PDF document’s header & footer in text or image format.
  • Add tables & text or image stamps to PDF documents.
  • Append multiple PDF documents to an existing file.
  • Work with PDF attachments, annotations, & form fields.
  • Apply encryption or decryption to PDF documents & set a password.
  • Delete all stamps & tables from a page or entire PDF document.
  • Delete a specific stamp or table from the PDF document by its ID.
  • Replace single or multiple instances of text on a PDF page or from the entire document.
  • Extensive support for converting PDF documents to various other file formats.
  • Extract various elements of PDF files & make PDF documents optimized.
  • You can try out our free App to test the functionality.

  •