Extract Table from PDF in Python SDK
Extract a Tables from a PDF Document using Cloud using Cloud Python SDK.
Get StartedHow to extract Tables from PDF via Cloud Python SDK
To extract tables from PDF, we’ll use Aspose.PDF Cloud Python SDK. This Cloud SDK assists Python programmers in developing cloud-based PDF creator, annotator, editor, and converter apps using Python programming language via Aspose.PDF REST API. Simply create an account at Aspose for Cloud and get your application information. Once you have the App SID & key, you are ready to give the Aspose.PDF Cloud Python SDK. If the python package is hosted on Github, you can install directly from Github:
Installation from Github
pip install git+https://github.com/aspose-pdf-cloud/aspose-pdf-cloud-python.git
Package Manager Console Command
pip install asposepdfcloud
Steps to get Tables from PDF via Python SDK
Aspose.PDF Cloud developers can easily load & extract tables from PDF in just a few lines of code.
- Install Python SDK
- Upload a PDF document to the Aspose Cloud server
- Download the processed PDF document from the Aspose Cloud server
- Get all tables
Extract Tables from PDF using Python
import shutil
import json
import logging
from pathlib import Path
from asposepdfcloud import ApiClient, PdfApi, Table, Row, Cell, FontStyles, GraphInfo, TextRect, TextState, Color, BorderInfo
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
class Config:
"""Configuration parameters."""
CREDENTIALS_FILE = Path(r"C:\\Projects\\ASPOSE\\Pdf.Cloud\\Credentials\\credentials.json")
LOCAL_FOLDER = Path(r"C:\Samples")
PDF_DOCUMENT_NAME = "sample.pdf"
LOCAL_RESULT_DOCUMENT_NAME = "output_sample.pdf"
PAGE_NUMBER = 2
TABLE_ID = "GE5TCOZSGAYCYNRQGUWDINZVFQ3DGMA"
class PdfTables:
""" Class for managing PDF tables using Aspose PDF Cloud API. """
def __init__(self, credentials_file: Path = Config.CREDENTIALS_FILE):
self.pdf_api = None
self._init_api(credentials_file)
def _init_api(self, credentials_file: Path):
""" Initialize the API client. """
try:
with credentials_file.open("r", encoding="utf-8") as file:
credentials = json.load(file)
api_key, app_id = credentials.get("key"), credentials.get("id")
if not api_key or not app_id:
raise ValueError("init_api(): Error: Missing API keys in the credentials file.")
self.pdf_api = PdfApi(ApiClient(api_key, app_id))
except (FileNotFoundError, json.JSONDecodeError, ValueError) as e:
logging.error(f"init_api(): Failed to load credentials: {e}")
def upload_document(self):
""" Upload a PDF document to the Aspose Cloud server. """
if self.pdf_api:
file_path = Config.LOCAL_FOLDER / Config.PDF_DOCUMENT_NAME
try:
self.pdf_api.upload_file(Config.PDF_DOCUMENT_NAME, str(file_path))
logging.info(f"upload_document(): File {Config.PDF_DOCUMENT_NAME} uploaded successfully.")
except Exception as e:
logging.error(f"upload_document(): Failed to upload file: {e}")
def _show_tables_info(self, tables, prefix):
if tables and len(tables) > 0 :
for table in tables:
logging.info(f"{prefix} => id: '{table.id}', page: '{table.page_num}', rows: '{len(table.row_list)}', columns: '{len(table.row_list[0].cell_list)}'")
else:
logging.error(f"showBoormarks() error: array of tables is empty!")
def get_all_tables(self):
if self.pdf_api:
resultTabs = self.pdf_api.get_document_tables(Config.PDF_DOCUMENT_NAME)
if resultTabs.code == 200 and resultTabs.tables:
if not resultTabs.tables.list or len(resultTabs.tables.list) == 0:
logging.error("get_all_tables(): Unexpected error - tables is null or empty!!!")
self._show_tables_info(resultTabs.tables.list, "All tables")
return resultTabs.tables.list
else:
logging.error("get_all_tables(): Unexpected error - can't get links!!!")
def get_table_by_id (self):
if self.pdf_api:
resultTabs =self.pdf_api.get_table(Config.PDF_DOCUMENT_NAME, Config.TABLE_ID)
if resultTabs.code == 200 and resultTabs.table:
self._show_tables_info( [ resultTabs.table ], "Table by Id")
return resultTabs.table
else:
logging.error("get_table_by_id(): Unexpected error - can't get links!!!")
if __name__ == "__main__":
pdf_tables = PdfTables()
pdf_tables.upload_document()
pdf_tables.get_all_tables()
pdf_tables.get_table_by_id()
Work with Tables in PDF
Tables provide a structured format for presenting data systematically, making it easier for readers to understand and analyze information. They also enhance the visual appeal of a document, adding professionalism and organization. When dealing with numerical or comparative data, tables improve clarity by grouping related information in an easy-to-read format. Additionally, tables can incorporate real-time or dynamically generated content, such as data from databases or analytics dashboards. Extract the Table from PDF documents with Aspose.PDF Cloud Python SDK.
With our Python library you can:
- Combine PDF documents.
- Split PDF Files.
- Convert PDF to other formats, and vice versa.
- Manipulate Annotations.
- Work with Images in PDF, etc.
- You can try out our free App to test the functionality online.