HTML JPG PDF XML DOCX
  Product Family
PDF

Parse PDF for extraction Text by name in Python SDK

API for parsing PDF documents to extraction text by name using server-side Python API.

Get Started

How to parse PDF documents for extraction Text by name using Cloud Python SDK

For parse PDF documents to extraction Text by name via Cloud Python SDK , we’ll use Aspose.PDF Cloud Python SDK This Cloud SDK assists Python programmers in developing cloud-based PDF creator, annotator, editor, converter and parser apps using Python programming language via Aspose.PDF REST API. Simply create an account at Aspose for Cloud and get your application information. Once you have the App SID & key, you are ready to give the Aspose.PDF Cloud Python SDK. If the python package is hosted on Github, you can install directly from Github:

Installation from Github


     
    pip install git+https://github.com/aspose-pdf-cloud/aspose-pdf-cloud-python.git
     
     

Steps to parse PDF to extaction Text by name using Python SDK

Aspose.PDF Cloud developers can easily parse PDF documents for extraction Text by name. Developers need just a few lines of code.

  1. Create a new Configuration object with your Application Secret and Key
  2. Create an object to connect to the Cloud API
  3. Upload your document file
  4. Parse PDF documents for extraction Text by name in cloud storage using get_text_box_field function
  5. Checks the response and logs the result
  6. Download Text box info in JSON file locally if needet
 

This sample code shows parsing PDF document for extraction Text by name


import shutil
import json
import logging
from pathlib import Path
from asposepdfcloud import ApiClient, PdfApi
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")


class ParseExtractTextBox:
    """Class for extracting text box from PDF document using Aspose PDF Cloud API."""
    def __init__(self):
        self.pdf_api = PdfApi(ApiClient(APP_KEY, APP_SID)

    def upload_document(self, documentName: str, localFolder: str, remoteFolder: str):
        """Upload a PDF document to the Aspose Cloud server."""
        if self.pdf_api:
            file_path = localFolder / documentName
            try:
                if remoteFolder == None:
                    self.pdf_api.upload_file(documentName, str(file_path))
                else:
                    opts = { "folder": remoteFolder }
                    self.pdf_api.upload_file(remoteFolder + '/' + documentName, file_path)
                logging.info(f"File {documentName} uploaded successfully.")
            except Exception as e:
                logging.error(f"Failed to upload file: {e}")

   def Extract(self, documentName: str, texxtboxName: str, localFolder: Path, remoteFolder: Path):
        self.upload_document(documentName, remoteFolder)

        opts = {
            "folder": remoteFolder
        }
	response = self.pdf_api.get_text_box_field(documentName, textboxName, **opts)
        if response.code != 200:
            logging.error("ParseExtractTextBox(): Unexpected error!")
        else:
	    logging.info(f"ParseExtractTextBox(): TextBox field '{textboxName}' successfully extracted from the document '{documentName}'.")
            localJsonFile = str(Path.joinpath(localFolder, "text_box_objects.json"))
            with open(localJsonFile, "w", encoding="utf-8") as localFile:            
               jsText = json.dumps(response.field.__dict__, ensure_ascii=False, default=str, indent=4)
               localFile.write(jsText)
 

Work with the Text parsing in PDF via Python SDK

By parsing PDF documents for text extraction by name, you can modify the content of TextBox field as needed. This maintains the position of the text in the documents while saving time and reducing manual work. Parse PDF documents to extraction text by name with Aspose.PDF Cloud Python SDK.

With our Python SDK you can

  • Add PDF document’s header & footer in text or image format.
  • Add tables & text or image stamps to PDF documents.
  • Append multiple PDF documents to an existing file.
  • Work with PDF attachments, annotations, & form fields.
  • Apply encryption or decryption to PDF documents & set a password.
  • Delete all stamps & tables from a page or entire PDF document.
  • Delete a specific stamp or table from the PDF document by its ID.
  • Replace single or multiple instances of text on a PDF page or from the entire document.
  • Extensive support for converting PDF documents to various other file formats.
  • Extract various elements of PDF files & make PDF documents optimized.
  • You can try out our free App to test the functionality.

  •