Extract Text from PDF using PHP

By: CodexWorld | In: PHP | Last Updated: May 31, 2024

The PDF (Portable Document Format) file is used to save text/image data for offline use. Sometimes PDF file is used to display text/graphics content on the web page for online use. Generally, a web viewer is used to embed PDF files on the browser. When a PDF file is embedded on the web page, the text/graphics content is not appended to the HTML page. Since the PDF content is not rendered on the web page, it causes a negative impact on SEO. To overcome this issue, you can extract text content from PDF and include it on the web page.

PDF Parser library is very helpful to extract elements from PDF files using PHP. This PHP library parses PDF files and extracts text contents from all the pages. The object, headers, metadata, and text can be parsed from the PDF file using PHP. This tutorial will show you how to extract text from PDF files using PHP.

In this example script, we will use the PDF Parser library to extract text from PDF with PHP. Also, we will show how you can upload PDF files and extract text data on the fly using PHP.

Install PDF Parser Library

Run the following command to install PDF Parser library using composer.

composer require smalot/pdfparser

Note that: You don’t need to download the PDF Parser library separately, all the required files are included in the source code. Download the source code if you want to install and use PDF Parser without composer.

Include autoloader to load PDF Parser library and helper functions in the PHP script.

include 'vendor/autoload.php';

Extract Text from PDF

The following code snippet extracts all the text content from PDF file using PHP.

Initialize and load PDF Parser library.
Specify the source PDF file from where the text content will retrieve.
Parse PDF file using parseFile() function of the PDF Parser class.
Extract text from PDF using getText() method of the PDF Parser class.

// Initialize and load PDF Parser library 
$parser = new \Smalot\PdfParser\Parser(); 
 
// Source PDF file to extract text 
$file = 'path-to-file/Brochure.pdf'; 
 
// Parse pdf file using Parser library 
$pdf = $parser->parseFile($file); 
 
// Extract text from PDF 
$textContent = $pdf->getText();

Upload PDF File and Extract Text

This example code snippet shows you the step-by-step process to upload PDF files and extract the text using PHP.

PDF File Upload Form:
Define HTML elements for file uploading form.

<form action="submit.php" method="post" enctype="multipart/form-data">
    <div class="form-input">
        <label for="pdf_file">PDF File</label>
        <input type="file" name="pdf_file" placeholder="Select a PDF file" required="">
    </div>
    <input type="submit" name="submit" class="btn" value="Extract Text">
</form>

On form submission, the selected file is submitted to the server-side script for process further.

Server-side Script (submit.php) to Extract Text from Uploaded PDF:
The following code is used to upload the submitted file and extract text from PDF.

Retrieve file name using $_FILES in PHP.
Get file extention using pathinfo() function with PATHINFO_EXTENSION filter.
Validate the file to check whether it is a valid PDF file.
Retrieve file path using tmp_name in $_FILES.
Parse uploaded PDF file and extract text content using PDF Parser library.
Format text content by replacing the new line (\n) with line break (<br/>) using nl2br() function in PHP.

$pdfText = ''; 
if(isset($_POST['submit'])){ 
    // If file is selected 
    if(!empty($_FILES["pdf_file"]["name"])){ 
        // File upload path 
        $fileName = basename($_FILES["pdf_file"]["name"]); 
        $fileType = pathinfo($fileName, PATHINFO_EXTENSION); 
         
        // Allow certain file formats 
        $allowTypes = array('pdf'); 
        if(in_array($fileType, $allowTypes)){ 
            // Include autoloader file 
            include 'vendor/autoload.php'; 
             
            // Initialize and load PDF Parser library 
            $parser = new \Smalot\PdfParser\Parser(); 
             
            // Source PDF file to extract text 
            $file = $_FILES["pdf_file"]["tmp_name"]; 
             
            // Parse pdf file using Parser library 
            $pdf = $parser->parseFile($file); 
             
            // Extract text from PDF 
            $text = $pdf->getText(); 
             
            // Add line break 
            $pdfText = nl2br($text); 
        }else{ 
            $statusMsg = '<p>Sorry, only PDF file is allowed to upload.</p>'; 
        } 
    }else{ 
        $statusMsg = '<p>Please select a PDF file to extract text.</p>'; 
    } 
} 
 
// Display text content 
echo $pdfText;

Advanced Usage of PDF Parser Library

You can use this PDF parser library for various needs. Here are some advanced uses to further customize the PDF parser and text output.

Extract the text of a specific page from PDF:

// extract the text of a specific page (in this case the first page) 
$text = $pdf->getPages()[0]->getText();

Extract the text of a limited amount of pages from PDF:

// extract text of a limited amount of pages. here, it will only use the first two pages. 
$text = $pdf->getText(2);

Extract metadata from PDF:

$metaData = $pdf->getDetails();

Array
(
    [Title] => Brochure
    [Producer] => Skia/PDF m94 Google Docs Renderer
    [Pages] => 2
    ...
)

Add Watermark to Existing PDF using PHP

Do you want to get implementation help, or enhance the functionality of this script? Click here to Submit Service Request

View Demo Download Source Code

Add Watermark to Existing PDF using PHP

Ajax Pagination with Column Sorting in PHP

2 Comments

Gholland Said...

Can it read text from pdf tables?

July 25, 2024 at 5:27 PM
Naeem Said...

Thanks for Extract Text from PDF using PHP
My question is can we save this extracted text into database ?

June 23, 2023 at 10:54 AM

Cancel

WEB DEVELOPMENT

Do you need a website for your business? We are here to help you out!
We develop cost-effective web applications with complete solutions. Build a new website or enhance your existing application at an affordable cost.

GET A FREE QUOTE

Extract Text from PDF using PHP

Install PDF Parser Library

Extract Text from PDF

Upload PDF File and Extract Text

Advanced Usage of PDF Parser Library

RELATED TUTORIALS

2 Comments

Leave a reply

Cancel

WEB DEVELOPMENT

Do you need a website for your business? We are here to help you out!
We develop cost-effective web applications with complete solutions. Build a new website or enhance your existing application at an affordable cost.

TRENDING TUTORIALS

TOPICSall topics

LATEST HOW TO GUIDES

ABOUT US

Contact

Services

Extra Links

Extract Text from PDF using PHP

Install PDF Parser Library

Extract Text from PDF

Upload PDF File and Extract Text

Advanced Usage of PDF Parser Library

RELATED TUTORIALS

2 Comments

Leave a reply

Cancel

WEB DEVELOPMENT

Do you need a website for your business? We are here to help you out!We develop cost-effective web applications with complete solutions. Build a new website or enhance your existing application at an affordable cost.

TRENDING TUTORIALS

TOPICSall topics

LATEST HOW TO GUIDES

Do you need a website for your business? We are here to help you out!
We develop cost-effective web applications with complete solutions. Build a new website or enhance your existing application at an affordable cost.